The Walt Disney Company Manager, Service Availability in Celebration, Florida
At Disney, we‘re storytellers. We make the impossible, possible. We do this through utilizing and developing cutting-edge technology and pushing the envelope to bring stories to life through our movies, products, interactive games, parks and resorts, and media networks. Now is your chance to join our talented team that delivers unparalleled creative content to audiences around the world.
The Disney Technology Operations Center (DTOC) is a 24x7x365 critical services operation center serving as the central hub for the orchestration of service availability, with primary focus to rapidly respond to, correlate for, and reduce impact of outages. We are accountable for identifying and facilitating the resolution of service impacting events, and collaborating with other technology teams to prevent future impact through proactive event management, incident and problem analysis. The DTOC drives the execution of the major incident process including communication to executives and key stakeholders. The DTOC owns and executes the IT Emergency Operations Center Crisis Management plan and process, with responsibility for maturing the plan and its integration into the overall Corporate Crisis Management and TWDC programs. The DTOC also provides ongoing first and second-level technical support of requests, performs validation procedures for routine system/service checks, and fulfills proactive monitoring with communication for HyperCare of significant business events.
The Service Availability Manager is responsible for the execution and continuous improvement of the Event, Incident, Major Incident, Crisis Management, Hypercare execution, and Problem Management processes within the DTOC. The Service Availability Manager manages technical resources, individual contributors and/or supplier personnel directly and indirectly.
The Service Availability Manager will work onsite in the DTOC location(s), an assigned shift scheduled to fulfill business needs. Current team shifts are split between Sun-Wed (‘Front halves’) and Wed-Sat (‘Back halves’) and are staggered over a 24-hour period in three shifts - Days, Swings and Graves.
The Service Availability Manager will also be required to perform after hours work on a rotation basis. Shifts and after hours work require working nights, holidays and weekends. While the shifts will be assigned in a fixed capacity, the end of planned working times are approximate and there may be a need to stay longer, cover other shifts, fill for PTO, and cover outages. Shift assignments are reviewed and adapted to changes in business needs. Shifts are subject to change and all SAM’s need to be able/flexible to work any of the shifts, including overnight, in addition to the after-hours work.
Drive the efficiency and effectiveness of the Event, Incident, Major Incident, Request Fulfillment and Problem Management processes
Partner with suppliers to ensure third parties fulfill their contractual obligations with regard to response, diagnosis, resolution and providing RCA-related information and data
Identify service improvement opportunities through trend analysis, proactive techniques, and after-action reviews
Publish DTOC utilization and service performance metrics regularly
Identify and drive service availability improvement opportunities by executing leading practices
Ensure that all DTOC services are designed to deliver the levels of availability required by the business, and validate of the final design to meet the minimum levels of availability as agreed by the business for IT services
Elevate any gaps proactively with leadership
Participate in creating, maintaining, and regularly reviewing department procedures, operational readiness plans and posture, aimed at improving the overall availability of IT services and infrastructure components, to ensure that existing and future business availability requirements can be met. This includes compiling daily operational reports and facilitation of operational readiness calls.
Ensure the DTOC is effectively monitoring available tools and systems for high availability and swift response to potential and actual outage situations
Perform as the incident commander on service outage calls, orchestrating recovery activities of DTOC and other technology teams to drive fast restoration of service without added risk to the organization, providing command and control of the call
Effectively apply Kepner Tregoe Incident Analysis and Problem Analysis technique during an incident and post-incident and ensure staff apply the same
Identify and participate in department improvement initiatives to develop staff capabilities, process re-engineering and related improvements
During outage situations consistently provide Situation Reports in a timely fashion, ensure work streams toward resolution are clearly articulated following department procedures, and business impacts are obtained and all communicated
Manage direct and indirect DTOC individual contributor staff to ensure 100% on-site coverage required to effectively support incidents, service requests, proactive health checks and HyperCare services
Occasionally during the year, perform as a ride-out crew members on-site in the critical operations center during severe weather or other crisis situations (e.g. weather-related hurricane, snow storm, etc.), until the threat condition passes and you are released
Responsible for influencing and socializing DTOC solutions, practices, roles, responsibilities, and processes
Participate in creating, maintaining, and regular reviews targeting the overall readiness of services for existing and future business needs, including Operational Readiness Reviews (ORR)
Contribute to the development and sustainment of an enterprise level incident, event, and availability management strategy
Participate in the development and governance of service level agreements
People Management and Leadership
Provide day-to-day leadership to the DTOC team. This includes; development of resources, definitions of roles, expectations, hold team members accountable for quality and timeliness of work assignment, training, and ensuring staff meets all organizational requirements and objectives
Work with staffing agency(s) and/or managed service partners to ensure the proper level of performance is obtained from contract resources; work with service providers to ensure the proper level of performance and value is obtained
Perform regular coaching and performance feedback for direct and indirect personnel reporting to this role. Ensure feedback is documented and relayed using approved tools and practices.
Perform talent acquisition activities for vacant roles
Basic Qualifications :
3+ years’ experience supporting converged infrastructure stacks, including: application, hypervisor, compute, storage and networking
3+ years leading incident recovery with multi-disciplined geographically dispersed teams in a Fortune 500 organization
3+ years of experience in either a large IT shared services organization or outsourced environment
Experience leading technical recovery of major incidents for Fortune 500 organization
Experience with hands-on support of cloud operations with one or more: AWS, Google Cloud or Azure
Experience supporting diverse portfolios, multiple business applications and IT services
Experience working in a 24x7 IT operations environment.
Demonstrated experience with Service and Event Management tools.
Demonstrated experience in systems integration, application infrastructure support and middleware operations.
Demonstrates management skills, both from a resource management perspective and from the overall control of a process
Proven experience and understanding of root cause analysis techniques
Proven ability to be detail, deadline, and results-oriented
Strong leadership skills with the ability to motivate and encourage others
Ability to manage competing priorities and workflow
Solid interpersonal skills for written, oral, and face to face communications
Practical experience with influence and negotiation methods and techniques
Ability to serve as mentor and coach
Strong customer service orientation, seeking opportunities to serve clients
Preferred Qualifications :
ITIL Foundations V3
Kepner-Tregoe highly desired
Experience in enterprise scale IT departments highly desired
Experience with outsourced IT environments highly desired
Required Education :
- Bachelors’ Degree in Information Systems, Computer Science or equivalent experience
Job ID: 585449BR
Job Posting Company: The Walt Disney Company (Corporate)