Sequence of events
The Sydney Terminal Control Unit (TCU) provided air traffic services within 45 NM of Sydney Airport, to a height of 28,000 ft. The normal power supply for the TCU consisted of mains or generator supply feeding into two separate, independent Uninterruptible Power Supply (UPS) units consisting of "A" and "B" systems that shared the electrical load.
A routine 6 monthly performance inspection of The Australian Advanced Air Traffic System (TAAATS) Sydney TCU UPS was scheduled from 1500 Eastern Standard Time on 6 July until 1800 the next day, under an approved works plan. This performance inspection was conducted ahead of schedule to avoid the Olympics period.
The electrical technical officers scheduled to conduct the inspection were delayed because of higher priority tasking associated with the Sydney control tower. Subsequent approval to start the inspection at 1800 was gained under normal procedure through the Melbourne Technical Customer Interface and the Sydney TCU Traffic Manager. The approval task of the TCU traffic manager in allowing works plans to go ahead was not based on any structured risk assessment process. The approval was based on the experience of the traffic manager in assessing and forecasting aircraft movements, staff availability and knowledge of the likelihood and outcome of failure of the power supply for the TCU. Information contained in the works plan that there would be no interruption to service, combined with the stability of the power supply during past similar works, was used to support the traffic manager's decision to approve the works.
Work started on the performance inspection at 1800:36. At 1822:24 the Sydney TCU sustained a total loss of electrical power. Power was restored at 1822:38, 14 seconds later.
This loss of electrical power caused TCU Air Traffic Control (ATC) workstations, software switching of voice communications channels, satellite communications, provision of the Sydney Terminal Approach Radar to Melbourne and Brisbane and operational room lighting, to fail.
The ATC workstations automatically began rebooting after the initial 14 second power outage. The workstations were not available for about 7 to 10 minutes longer while they rebooted.
The air traffic controllers in the TCU were unable to determine the relative positions of aircraft under their jurisdiction for about 7 to 10 minutes. By using the emergency bypass air/ground radio, controllers were able to direct flight crews to keep a visual lookout for aircraft and to turn on their Traffic Alert and Collision Avoidance System (TCAS).
Power was not lost in the Sydney Control Tower and all systems continued to operate normally except the controller workstations that went into a "degraded mode" following the initial power outage, however the Sydney Tower controllers were able to provide normal ATC services to aircraft under their jurisdiction.
Full radar display from the Terminal Area Radar was available on the Tower Data Processing and Display System.
The Brisbane and Melbourne TAAATS Centres provided radar services and limited support during the period that the Sydney TCU ATC workstations were not available.
A review of the air traffic control data recorded at the TAAATS centre in Melbourne, showed that there was no infringement of separation standards.
The works plan mentioned that an inspection would be carried out between 1500 on 6 June and 1800 on 7 June on the Sydney TCU power supply. The works plan wrongly referred to the month as June instead of July. The works plan stated that during the work there would be one UPS and generator set available and there would be no interruption to the power supply.
The two UPS units at the Sydney TCU had been upgraded from a 6 pulse system to a 12 pulse system in February 1999 and had been reported as being stable and having not caused any interruption to service since commissioning.
The two separate, independent Uninterruptible Power Supply (UPS) units consisted of "A" and "B" systems that normally shared the electrical load for the TCU. The performance inspection needed the full electrical load of the TCU to be placed on the "B" system while the "A" system was being tested off-line. The electrical technical officers reported the "B" system was switched from mains bypass to take the full load of the TCU at which point power to the TCU was lost. They also reported that all indications on the "B" system were normal, except that the output currents indicated zero.
Subsequent independent testing of the UPS equipment by Airservices and the UPS manufacturer found the system performed satisfactorily. Exhaustive testing of a similar UPS at the manufacturer's testing facility could not reproduce a power loss similar to that experienced by the Sydney TCU.
During testing by Airservices there was an inconsistency identified between the Rectifier Current Limit settings of the "A" and "B" systems. The "B" system rectifier current limit setting was set at 0.6V while the "A" system had a higher setting of 1.945V. The lower rectifier current limit setting of the "B" system may have occurred during training on the "B" system in May 2000. Airservices initial investigation found this setting may have caused the power outage. However, further investigation determined the lower rectifier current limit setting of the "B" system, by itself, would not have caused the power outage to the TCU.
The state of various conditions within each UPS unit is monitored and recorded independently by the National Technical Monitoring System. There are inconsistencies between these records and the recollection of site staff for the actions immediately prior to the power loss. The recorded data is not sufficiently comprehensive to allow the conclusive determination of the cause of the power loss.
Brisbane and Melbourne TAAATS centres operated normally and kept radar services during the power outage but lost the supply of Sydney Terminal Area and the Mount Boyce (Blue Mountains west of Sydney) radar data.
The Sydney TCU "fallback" radar system that was designed to assist controllers in maintaining situational awareness in the event of a complete TAAATS failure also failed because it was powered by the same power supply that was subject to the power outage.
Display of the Sydney Terminal Area Radar was available in Sydney Tower on the Tower Data Processing and Display System.
Recordings of air/ground and ground/ground communications for Sydney Tower and TCU were not available for 73 seconds from 1821:37 until 1822:51.
The Voice Switch panels at each console failed causing the loss of normal air/ground and ground/ground communications. Bypass air/ground communication remained available at those consoles which use remote Very High Frequency (VHF) outlets (50 volt battery powered communications equipment). Bypass air/ground communication was available within a few seconds after the 14-second power outage. Some controllers tried to use the air/ground bypass equipment during the 14-second outage and found that it was not available. The air/ground bypass equipment was used inconsistently depending on which controllers had tried to use the system within the 14 seconds and which controllers tried to use it later.
Following the power outage TCU controllers broadcast advice of the failure to air crew and advised them to keep a visual lookout and to ensure that their Traffic Alert and Collision Avoidance System (TCAS) was switched on. Some inbound aircraft were transferred to Sydney Tower controllers and outbound aircraft were transferred to Melbourne and Brisbane TAAATS centres.
There was no air/ground bypass equipment fitted to the Sydney TCU Directed Traffic Information position.
The individual voice switch restored to a ground/ground bypass mode that allowed controller's access to the PABX. However that system was not used because the secondary display windows that stored the telephone numbers were not accessible.
The Voice Switch Management Station that controlled restarting of the Voice Switch had an electronic latching switch that activated during the initial power outage. This needed a radio technician to manually reset a switch that allowed the Voice Switch to automatically reboot. The voice switch was not available for TCU controller use until 1832.
Sydney Tower air/ground communications were unaffected. However, ground/ground communications from Sydney Tower controllers to the TCU controllers were unavailable until 1832.
The Airport Rescue and Fire Fighting fire alarm system continued to work without any equipment error recorded.
Sydney TCU supervisor's PABX telephones were available during the power outage using separate handsets and dial pads.
The Sydney TCU Team Leader advised the Sydney Tower Traffic Management Coordinator by telephone that "we've lost everything down here". The Tower Traffic Management Coordinator replied "so have we". The Tower Traffic Management Coordinator did not understand the extent of the failure of the TCU. Consequently the TCU Team Leader did not request radar support from the Tower Traffic Management Coordinator.
Brisbane and Melbourne TAAATS centres were advised of the power outage by telephone from the Sydney TCU Supervisor's position and by relay of messages from aircraft near Sydney by using the air/ground bypass system.
There was no message on the Computerised Automatic Terminal Information System advising flight crews of the reduced services being provided by the Sydney TCU.
Air traffic control equipment
The 14-second power interruption caused the Sydney TCU TAAATS workstations to automatically reboot. The estimated time for the computers to reboot was between 7 and 10 minutes. This left the controllers without any air situation display for between 7 and 10 minutes.
During the outage, tools were not available for the controllers to maintain situational awareness.
The Sydney Tower air situation display lost new code/callsign correlation but kept the existing code/callsign correlations and automatically went into bypass (a degraded mode) because of the loss of the TCU radar data processor. This limited radar display and the Tower Data Processing and Display System was available for reference by the TCU controllers but was not used because of the misunderstanding of the extent of the equipment outage.
The segregated airspace design used in the Sydney TCU provided adequate short-term protection for aircraft to remain separated during the power outage without any ATC intervention.
The investigation found that for operations at Sydney Airport within the curfew period between 2300 to 0600, there was no airspace segregation between the arrival and departure stages of flight.
The Standard Terminal Arrival Route (STAR) communication failure procedures advise flight crew to track to the Sydney VOR and then fly the most suitable instrument approach to the nominated runway in accordance with the Enroute Supplement Australia (ERSA) emergency section. The investigation found the design of some of the instrument approaches into Sydney precluded flight crew from making an instrument approach from overhead the Sydney VOR during communication failure. As an example, flight crews that were advised to expect runway 34 Right for arrival before the power failure, would track to the VOR and would not be able to start the runway 34 Right ILS approach because this approach requires radar vectors to intercept final approach.
The UPS switchboard is physically set up in a manner that when facing the switchboard, the "A" system, is on the left side and the "B" system is on the right side. This is the exact opposite to the schematic diagram for this UPS system, where the "A" system is on the right side of the diagram and the "B" system is on the left side.
There were two different UPS handbooks available to maintenance services staff. One was the Airservices controlled document which was not current for the equipment fitted at Sydney; the other was an updated handbook issued by the manufacturer during staff training in May 2000. The uncontrolled handbook contained the incorrect rectifier current limit setting. The switching procedure for removing and returning a UPS to service had been revised between the two handbooks.
Airways engineering instructions
The electrical performance inspection was carried out under Airway Engineering Instruction (AEI) AEI-3.4053 issue number 4. The purpose of this AEI is "to define the electrical tasks to be carried out during the performance inspection of the static UPS equipment... installed at Airservices Australia facilities". This AEI is non-prescriptive and is a generic instruction to cover all UPS equipment installed at Airservices facilities.
The document does not clearly define the tasks to be conducted, nor does it refer to the manufacturer's instructions about how to correctly carry out those tasks.
The AEI required the full load of the Sydney TCU to be placed on one UPS system. This removes the redundancy of the normal two independent UPS unit configuration and creates a single point of failure.
The separation assurance techniques used by the Sydney TCU controllers before the power outage aided in maintaining separation standards during the power outage.
Degraded modes procedures were available in check list form for the TCU controllers. The degraded modes check lists were "designed to enable TAAATS controller's to be able to quickly identify degraded or abnormal system operation and to list the procedures:
- to enable an immediate response to maintain safety
- to be adopted for continued safe operations in the degraded mode
- to be adopted when upgrade to normal operations is available"
The check lists were not designed for nor did they address the multiple failures associated with the power outage. It was expected that the reliability and redundancies incorporated in the system design would preclude a total electrical failure.
The degraded modes check list recommended that "Operators should maintain familiarity with operating in these degraded modes and the need to prioritise actions in accordance with the following:
- maintain separation by use of alternative standards if necessary
- issue traffic information as required
- advise others of your degraded status
- adopt full verbal coordination and handoff procedures
- maintain and modify the flight data record (FDR)"
Sydney Tower controllers stopped aircraft departures from Sydney in accordance with degraded mode procedures.
Most of the controllers rostered on duty during the power outage had completed their TAAATS conversion training in 1999; some had been more recently trained. The ATC TAAATS training did not cover or simulate the conditions experienced during the power outage.
Refresher training for elements of degraded operation, such as the use of the air/ground bypass system had not been conducted since the early TAAATS conversion training in 1999.
Flight crew procedures
The flight crews involved during the incident followed communication failure procedures as published in the Aeronautical Information Publication (AIP). The flight crews offered communications support to the Sydney TCU controller's by relaying controller instructions and advice of the power outage to other ATS units.
Maintenance services electrical personnel
All Sydney electrical technical officers took part in a training course provided by the UPS manufacturer in Sydney from 15-17 May 2000. The training was conducted on the operational UPS equipment.
Electrical technical officers usually worked in pairs when servicing the UPS equipment. These officers did not undertake any form of team resource management training to define their tasks while working as a team.
There were two electrical technical officers involved in testing the UPS. One was in the middle of his shift and the other had started duty at 0600 and had extended his shift to conduct the works plan that led to the occurrence.
According to Airservices National Technical Certification program (TechCert), both staff held suitable electrical qualifications to conduct the test.
The Airservices TechCert program assesses technical officers on their knowledge of and ability to safely remove and restore equipment from the national airways system. The TechCert program also required staff to work on the systems within predetermined timeframes for their TechCert certification to be current. Without this currency, staff cannot remove or restore equipment from the national airways system.
The Sydney Contingency Plan was not activated because of the short duration of the power outage.
The Sydney contingency plan did not include any reference to the loss of the Directed Traffic Information service.
The contingency plan includes reference to radar fail procedures and directs further reference to the degraded mode handbook for which there was no radar fail procedure. It was noted that this document was under review at the time of the incident.
The investigation could not discover the reason for the power outage in the Sydney TCU.
Although the initial timing of the UPS maintenance was scheduled for between 1500 and 1800, the duration of the works plan, 27 hours, would have left the TCU exposed to a single point of failure during busy traffic periods. The significance of running the TCU on a single UPS for 27 hours did not seem to be understood by the parties approving the works plan. Consideration should be given to conduct maintenance on critical equipment when the level of operations is low.
In western culture, people are taught to read from left to right. Information is processed in this order due to this learned behaviour. The converse display of the UPS system with the "A" system on the right and the "B" system on the left has the ability to confuse an operator as to which system they should be switching at any given time. Switching errors may occur due to the transposition of the schematic diagram as opposed to the actual machinery.
The Airways engineering instructions (AEIs) that related to the UPS equipment were inadequate for use by the electrical technical officers. The AEIs had insufficient procedures to ensure safe operations while conducting the required tasks. That was because of the broad parameters of those documents. The electrical technical officers were expected to know how things were done. The AEIs did not adequately direct how activities and tasks were to be carried out.
The practice of putting safety defences in place is a measure that is used to mitigate human error. Two of the functions of defences are to create understanding and awareness of the local hazards and to give clear guidance on how to operate (Reason, 1997). The AEIs for UPS equipment failed to meet those two basic safety defence functions. That proved to be a latent failure in the Airservices system.
There are many different types of equipment the technicians need to be familiar with. This multiple array of equipment does not allow the technicians to gain enough competency and currency on all types of equipment they are needed to repair and service. This lack of familiarisation with the equipment makes competency errors in maintenance procedures more likely. Greater clarification of the correct procedural steps to be undertaken by the technical officers would lessen the possibility of recurrence.
Equipment serviced by Airservices electrical technical officers has grown to a point where the use of a defined specific task list for each different type of equipment is needed. This is to correct the lack of safety defences by creating understanding and awareness of the local hazards and giving clear guidance needed by the technicians to conduct their tasks. This in turn would help in ensuring that occupational health and safety is not compromised. Specific guidelines, which include the steps involved in the processes needed to complete a task, need to be developed. This will decrease the possibility of an error occurring because of an incorrect sequence of events or procedure being carried out.
The Tower Traffic Management (TTM) Coordinator's view of the TCU Team Leader's advice of "we've lost everything down here" meant something different from the meaning inferred by the TTM Coordinator. This was probably because of prior experiences the TTM Coordinator held in terms of the "loss" of incoming information during degraded modes operations within TAAATS. Language interpretation is often based on learned expectations. When words or phrases are spoken, there is often a semantic and grammatical interpretation made of those words or phrases. In complex control and operation tasks, this interpretation is based on the context in which the information has usually been received. Misunderstandings can arise when faulty diagnosis of the underlying inferred information is assumed.
The information given by the TCU Team Leader appeared to be interpreted in a filtered manner because of previously learned expectations of the TTM Coordinator. This misunderstanding prevented the TCU controllers from getting help from the tower controllers. The misunderstanding by the TTM Coordinator may have contributed to the lack of a broadcast being made on the Computerised Automatic Terminal Information System.
The assumption made by the TCU Team Leader that the TTM Coordinator had lost a similar amount of facilities was based on his interpretation of the underlying problem. This assumption was not correct. There was no mechanism in the procedure of describing types or levels of emergencies that allowed either party to gain further clarification of the correct meaning. In flight operations, an example of this is the ability to rank an emergency as either a Pan call or a Mayday call. This needs to be addressed in the ATC environment.
The notion that managing all the available resources - information, equipment and people - at any given time in the most effective and efficient manner is not new in the aviation industry. The concepts of Team Resource Management (TRM) are the same as those for Crew Resource Management (CRM). However, there seems to be little consideration of this aspect for the work conducted by electrical technicians.
There is very little research on the use of TRM with technicians in the aviation industry. However, based on the successful implementation of CRM with flight crew, air traffic controllers and aircraft maintenance engineers, there is no reason to believe that this success cannot be transferred to technicians.
To achieve this efficient working relationship it is necessary to analyse what jobs technicians perform and which tasks need two people to accomplish. Team Resource Management can be designed to help in achieving the best use of resources to gain the desired outcomes.
Although technicians work together when conducting potentially life-threatening work, there were no procedures or practices in place to manage two sets of resources in unison, with each having specifically demarcated roles and responsibilities to enable work input and output to be managed to achieve the desired goals in the most effective manner.
There appeared to be some overlap in the tasks conducted by each technician at the Sydney Terminal Control Unit (TCU). This can lead to confusion as to who is to do what, and when each task is to be performed. Role clarification and demarcation of tasks is needed for technicians to decrease the chance of something being done twice or not being done at all. To do this successfully it is necessary to find out what the tasks are, when they are to be done, and which person should be completing them. A process review of the tasks performed and the level of skills and expertise needed for each task could accomplish this.
In their investigation report dated 23 October 2000 Airservices Australia recommended that:
- The effectiveness and efficiency of procedures and defences associated with the incident should be analysed and gaps or weaknesses addressed as a matter of priority (including Systems Configuration Controls).
- The configuration of the Sydney UPS System should be reviewed as a matter of urgency.
- A review of the document control procedures for Maintenance Services documentation should be conducted.
- The adequacy of design and configuration of the loads for the UPS should be reviewed with regard to maintaining the safety of the Air Traffic System and exposure to single point failures.
- The risk assessment process used in the assessment of the operational impact associated with works planned activities should be reviewed.
- The content and frequency of refresher training for ATC in degraded modes procedures and simulation should be reviewed as a matter of priority to ensure safe and timely actions through practised response and full understanding of system responses.
- National and local contingency plans in Brisbane, Melbourne and Sydney should be reviewed with regard to outcomes of the incident.
- Published communication failure procedures should be reviewed and an analysis of aircrew responses during the incident to those procedures be conducted.
- The directed traffic information position at Sydney Terminal Control Unit be fitted with an air to ground bypass.
- The effectiveness and efficiency of procedures and ATC responses associated with the incident should be reviewed by a local safety panel.
- A risk assessment of TAAATS workstations be conducted to establish whether there should be a redesign for mini-UPS for the Sydney Terminal Control Unit workstations.
- Consideration be given for the development of ATC workstations that have the ability to retain the air situation display to support the situational awareness of Air Traffic Controllers.
- Consideration be given to establishing printed phone numbers of adjoining ATC units, combined with a separate physical keypad for accessing the PABX in ground to ground bypass mode in the event of TAAATS screen failure.
- A Failure Modes Effects and Criticality Analysis be conducted at all remote Terminal Control Units to measure and address the risk associated with the hazard of UPS power failure.
- A review of the configuration of the Sydney TCU "fallback" system source of power be conducted.
As a result of this investigation the Australian Transport Safety Bureau makes the following recommendations.
The Australian Transport Safety Bureau recommends that Airservices Australia introduce Team Resource Management Concepts as part of electrical technical officer initial and recurrent training.
The Australian Transport Safety Bureau recommends that Airservices Australia perform a task analysis to determine what tasks electrical technical officers carry out. From this task analysis, role clarification should be developed.
The Australian Transport Safety Bureau recommends that Airservices Australia review the content of Airways Engineering Instructions for the maintenance and testing of UPS equipment.
The Australian Transport Safety Bureau recommends that Airservices Australia review the design of STARS and Instrument approaches with a view to improving separation assurance during communications or radar failure.
The Australian Transport Safety Bureau recommends that Airservices Australia review curfew operations in regard to providing a greater level of segregated airspace.
The Australian Transport Safety Bureau recommends that Airservices Australia review the training of electrical technical officers on operational equipment.
|Date:||06 July 2000||Investigation status:||Completed|
|Time:||1822 hours EST|
|State:||New South Wales|
|Release date:||05 April 2001|
|Report status:||Final||Occurrence category:||Incident|
|Highest injury level:||None|
|Aircraft manufacturer||No Aircraft Involved|
|Damage to aircraft||Nil|