Client Overview

As one of the largest energy companies in the US and a Fortune-100 electric utility company, the Client’s utilities provide energy to more than ten million customers. Their large territory and customer-centric values are why they maintain such a high quality standard for their IT systems.

When it comes to IT system storm readiness, the challenge of staging an end-to-end performance test across twenty-six systems was significant, but one that nonetheless had to be accomplished in the most through and responsible way. Providing tangible proof of IT reliability and performance is a key tenet of their mission to provide reliable service, rain or shine.

Business-Critical Systems Require Quality-Critical Testing

Our Client saw its service territories riddled by two successive storms that inflicted heavy damage through rain, snow and high sustained winds at gusts up to 70-mph for more than 48 hours.

The joint effect resulted in 1.7 million service interruptions across the region. The event was significant enough to necessitate escalation in the form of external help from restoration crews from energy companies in more than 16 states.

The storm also impacted critical IT systems, straining them to capacity and forcing outages and response time issues. Outage status information typically available to customers through channels such as text messaging and mobile apps became delayed or unavailable. What’s more, restoration teams experienced significant delays in their ability to diagnose, dispatch, and report status on field efforts.

The investigation that followed the incident revealed several root causes, one of which was insufficient end-to-end performance testing of mission critical storm applications.

The Client set forth to close this gap by running the first of its kind end-to-end performance test across 26 applications simultaneously. The test needed to be carefully planned and coordinated among several application support teams, including utility personnel and vendor support teams both onshore and offshore.

The objective was to connect the test systems of all 26 applications and apply a stairstep-like load profile that exercised scenarios with 25%, 50%, and 75% of customers out of power. The profile was designed to validate the system improvements implemented in response to the recent storms and to further stress the systems in the hopes of identifying vulnerabilities to be addressed in future improvement efforts.

The Client needed a partner that knew their business, was familiar with their IT systems, and had the necessary technical and process expertise to design and implement such a large-scale test.

“The Client set forth to close this gap by running the first of its kind end-to-end performance test across 26 applications simultaneously.”

The First of Its Kind End-to-End Performance Test

Qualitest applied our proven performance testing approach, scaling it to meet the needs of multiple applications and teams. Our approach progressed through three clear and well-defined phases:

  • Phase 1 –Test Strategy Development
  • Phase 2 – Test Implementation
  • Phase 3 – Continuous Improvement

Phase 1 – Test Strategy Development

In this phase, Qualitest performance test engineers and architects undertook the following tasks:

  • Storm Scenario Requirements Gathering: Qualitest interviewed Client personnel to understand the historical worst-case storm profile and its implications for transaction rates and data volumes both internal to and between IT systems. High-level requirements were translated into a detailed and measurable set of test requirements.
  • Current State Assessment: Our team took a close look at the existing application and interface architecture with an eye toward areas of concern or known weakness. We also took inventory of the existing storm readiness and performance testing activities. Information gleaned included the test architecture, interface boundaries, load and monitoring tools, test data, monitoring points, and pass/fail criteria. The end-result of the assessment was a consolidated picture of strengths, weaknesses, and opportunities.
  • Target State Recommendations: Given Storm Scenario Requirements and Current State Assessment, Qualitest provided a roadmap of prioritized opportunities for achieving the Client’s goal of an end-to-end performance test. Recommendations identified critical improvements to people, process, and tools. Financial projects were provided with cost savings as a critical consideration. We applied our considerable experience with performance testing tools to identify the best mix of capability while taking into technology that was already in place and/or licensed at the utility. In addition, offshore talent was infused to reduce the expense of new automation development.

Tools Identified:

  • Instrumentation (Monitoring tools)
    • Server Statistics: HP OMi, Solarwinds, Splunk, Nagios, Azure Monitor, IBM Monitoring Suite
    • Database: Oracle OEM, IBM DB2 Data Management Console
    • Message Queues: Tivoli, Custom Jmeter scripts
    • Application: AppDynamics, Azure Application Insights
  • Load Tools
    • Jmeter (Custom system scripts)
    • LoadRunner/Performance Center/ Stormrunner
  • Analysis & Analytics
    • MS Excel
    • PowerBI

Phase 2 – Test Implementation

In this phase, Qualitest took the recommendations from the strategy work and implemented them to make the test a reality. The work consisted of several tasks:

  • Test Planning: This included goals and constraints for outages per minute, throughput, response times and resource allocation as well as other success criteria identified in the strategy. These criteria were applied consistently across applications.
  • Results Reporting Prep: Created test result reporting template for all applications.
  • Test Scripting: Created necessary automation to inject outage reports, outage status queries from web/mobile platforms, text and IVR. Other automation included injecting SCADA and AMI events & exercising COMPASS and Mobile Dispatch.
  • Execution Planning: Built a resource loaded project schedule for preparing all test environments, staging the appropriate load and monitoring tools, conducting the tests, and gather results.
  • Test Execution: Coordinated the test across 26 applications and 70 test participants (mainly utility employees) serving in various roles.
  • Gather and Report Results: Compiled results as reported from all teams and published findings in a consolidated set of boards.

Phase 3 – Continual Improvement

After each test iteration, the team met to capture lessons learned and identify improvement opportunities to increase the efficacy of the test. The team identified and implemented a variety of changes, including modifications to methods, tools, process and personnel.

Improvements continue: The Client – having seen a great deal of value in the test – has commissioned Qualitest to help identify further efficiencies to reduce the time to market in running these tests and identify ways to proactively identify new test scenarios. Qualitest is moving to do just that through application leading edge analytics and AI platforms that digest a combination of data from production instrumentation and pervious test runs.

“Qualitest applied our proven performance testing approach, scaling it to meet the needs of multiple applications and teams.”

Key Benefits

Qualitest was able to mobilize quickly and stand up the first ever end-to-end performance test of its kind in the industry, an effort that resulted in:

  • An improved understanding of the reliability and capacity limits of customer channels (web/mobile platforms, text messaging, IVR)
  • Identification of 78 corrective actions in the first year. Items covered a broad spectrum of issues, including performance, stability, and response time issues, system limitations, application defects, configuration, and environmental problems.
  • 69 corrective actions identified in the second year. Items covered problems across the same spectrum.

The culminating impact of these benefits was a marked improvement in the Client’s IT system resiliency – resiliency that is critical to prevent or otherwise bring awareness to production system performance issues – issues that, if left unaddressed, can cause significant reputational damage and adversely impact important reliability measures.

“Qualitest was able to mobilize quickly and stand up the first ever end-to-end performance test of its kind in the industry.”

 

quality engineering free assessment

Download the PDF