Azure Support - IaaS PODS Case Management Process

Microsoft Azure

Azure IaaS PODS Case Management Process

CSS

Mission:

Solution Delivered First Contact and Accelerate Time to Resolution

The majority of Azure Platform related issues can be resolved quickly by obtaining the necessary information on first contact. We are the IT department for our customers. They depend on us to swiftly evaluate their symptoms and provide solutions. By doing so, we will help overcome common misconceptions that problems stem from the Azure Platform. The target is DTS <1.

CASE CREATION, ROUTING and ASSIGNMENT

Verify Eligibility and Identity:	In general, engineers are required to validate customer's identity and eligibility whenever the case was created outside of the Microsoft Azure Portal or for potentially destructive issues (i.e. delete a storage account, delete VM/VHD, Service Bus Namespace etc.). Granting Subscription Access (i.e., adding a service co-administrator, Admin Mode GFS Backend, etc.). Note: These actions have been taken over by ASMS team. To verify customer eligibility when needed, validate at least three pieces of information (see instructions on using CMAT here: Azure Support - Verify Customer Eligibility and Identity). Note: These actions have been taken over by ASMS team.
Free Cases:	With Azure Basic Support continuing to roll out, free support for technical issues related to the Azure platform will be more common. In line with Azure Basic Support is Azure Resource Health which also allows customers to open up technical cases for platform related issues without a paid support plan. Azure Basic Support is limited to Severity C by default. An exception occurs however when combined with a feature/service such as Azure Resource Health which will enable all severities by design. If you receive free cases which originated from the ASMS team, these case should not be blocked from receiving technical support. We want to do the right thing for our customers needing help. If the customer needs technical assistance, ASMS may enable 30 days of Developer Support and request the customer to open a case from within the portal however, if the Customer is upset with this ask, ASMS will create the free case. ASMS should not be setting severity to A and setting expectations with customer on behalf of Azure technical teams on call back.
Professional Direct Customers:	Microsoft Professional Direct (ProDirect) Support for Microsoft Azure provides first-class support designed especially for mid-sized customers that require elevated support. Please follow the same processes as you would Premier customers for ProDirect customers. Note: ProDirect customers do not have a dedicated Technical Account Managers. When transferring a misrouted SEV-A case engage appropriate teams Duty Manager conveying type of customer and level of service to locate immediate resource. While ProDirect customers do not have a dedicated technical account manager, they do have a team that acts as a TAM. Please use Azure Support - How To Contact The ProDirect Team to contact this team.
Engage Customer:	Service Requests (SR) are routed to the global POD queues. If there are cases present in the global POD queue or there are cases in your "Incoming" bin, the guidelines below should be observed: If you are not actively working on a case, ensure to set Bullseye client status to "Available". Review the case and notes in their entirety to accurately assess correct team ownership. Isolate if case remains with this team or another team/queue. If the case belongs to a different team then route it accordingly keeping in mind the following: Bullseye allows for 15 minutes after assignment to assess if misroute before case is reassigned. Create a SR (Pre-Scope) [Asset 4498193] indicating why you are transferring and complete it. Update Support Topic to transfer versus a manual transfer. See Misroutes for additional guidance. To greatly increase the probability of reaching the customer and providing Solution Delivered First Contact (SDFC), available engineers are encouraged to proactively contact the customer within the first 30 to 60 minutes of the SR hitting the queue (without SR being assigned). Additionally, the Azure Services Duty Managers <azuresvcsic@microsoft.com> team also monitors the POD queues and will seek out and inform engineers of SRs needing assistance as needed. Once Engineer takes ownership of the Service Request, call the customer regardless of communication preference. See Communications for more guidance. Note: During spikes in case volume, SRs should be prioritized based on highest support offering (Premier, ProDirect), severity and business impact.
Communications:	For Azure customers, periodic and consistent communication is critical in delivering quality service experiences and raising CPE. With the exception of production down situations, customers prefer to receive updates with the following methods: Manage Tickets functionality in the portal E-mail Phone E-MAIL / Phone Communication It is important to honor the customer’s communication preference (email or phone). For customers who prefer email, think about asking the customer over email if a phone call can be scheduled (if it would be a more efficient way to resolve the case). If a customer does not respond to your email (e.g. case has been idle for almost 2 days), it is acceptable to call the customer during their day time hours. Be sure to create a "Phone Communication" log by creating a Troubleshooting [Asset 4498193] and click on "Create Linked Phone Log" when a phone attempt has been made and/or a phone conversation has taken place with the customer while troubleshooting an issue. A Phone Communication log is automatically created in Communications and associated with the Troubleshooting [Asset 4498193]. For example: "This phone conversation is created as part of ' [Asset 4498193] name ' task." Engineers should provide SR status updates (email or phone honoring customers communication preference) regularly. See Daily Actions / 2935776: Azure Support - Azure IaaS PODS Daily Case Wellness Actions Note: "If a customer has to ask an engineer for an update on a case, we have missed an opportunity to exceed their expectations."[Asset 4497459] It is also recommended engineers send an update on the last day of shift to ensure customer is aware of work schedule and provide an option to work with the next available engineer via Azure Support Backup <azbackup@microsoft.com>. Please refer to 2912580: Azure Support - IaaS PODS Case Transfer Process for more details. Also refer to 2935781: Azure Support- IaaS PODS Email Replies to Difficult Support Scenarios when situations arise.
Misroutes:	For cases that are determined to be misroutes, follow the guidance below: First engineer owns the customer and their experience, even if misrouted. If correct team/queue is unknown, engineer contacts the customer. Collect basic info (including Azure Service) to frame the issue and record in SR pre-scope [Asset 4498193]. Inform the customer you are working to engage the best people for their request, you are their point of contact in the meantime. Avoid using the words "transfer your case". Use Radius to identify correct team/queue. Search Tip: Use keywords #AzurePOD or #AzureSupport for Azure Support Teams. If you cannot find correct team/queue in Radius within ~15 min.: Email Azure DM DL If DM knows specific TA to contact, should do so directly. If not, DM to email Azure TA DL: Case routing for Azure <whichazureteam@microsoft.com>. DM communicates correct queue and point of contact to engineer. DM responsible to record defect (SR#, incoming queue, misrouted queue, Radius defect). Engineer contacts team (TA / TL / TM / DM) of the receiving queue: Confirm that identified team is the correct team for the case via IM, email, phone, and document who confirmed in the SR pre-scope [Asset 4498193]. If unable to contact the receiving team (e.g., all offline), or if the identified team is incorrect, email Azure DM DL. Record reason for transfer in the SR pre-scope [Asset 4498193]. Transfer case to receiving queue. Avoid transferring cases using Manual Override. Select the Transfer Reason of "Misroute". Change SR Routing Product, Support Topic. Click Update Target Queue and Transfer. Ownership remains with first engineer until receiving team assigns a new owner. It is the original engineer's responsibility to remain the customer's point of contact and follow up if no new owner is being assigned. If needed, engage DM for assistance to ensure a new owner is assigned. Key Points: First engineer OWNS the customer and the customer experience until a new owner has been assigned If engineer cannot locate correct team via Radius, DM will assist
Local Language:	Bullseye assigns Local language (LL) cases to all engineers. Engineer will engage customer and only move ownership if the customer requires LL support. If the customer requires LL support, work with POD Leads/TAs/Engineers to find someone who can meet the language requirement. REMINDER!! 24x7 is English-only.
Support Topic Classification:	Ensure accurate coding of Support Topic (this can be performed at any phase of the case but not after closure).
Service Impacting Event / Outage:	When a SIE (outage) is declared by Pod Lead / Pod TA, a case management process exception occurs. Azure Service outages are managed by the Azure Pod(s) that have scope with the service impacted. SIE roles are identified and applicable processes in the SIE Playbook (Web view) are executed including 2834002: Azure Support - Handling Service Impacting Event (SIE) Outages. Engineers in impacted Pod(s) will go offline in Bullseye. Engineers not involved in SIE management will pull normal/non-SIE cases and continue to work per existing processes. Pod Leads / TAs will continuously monitor the queue and state of outage until mitigation.
ASMS Credit Requests:	During Service Interruptions (SIE or other events) requests for credit may be initiated by customers. After technical issue has been mitigated, engineer can create and dispatch a [Asset 4498193]to the "Azure Services - Subscription & Billing" queue containing Customer's E-mail, Microsoft ID, Subscription ID, Affected Services and the duration of interruption (see 2797526: Azure Support – APTS/CIE/WAWS/ARR to WASMS credit and refund requests) for processing. ASMS engineer will provide a new case number to process the credit request whereby engineer will then reference in the closing e-mail.

SCOPING

Initial Response / Scoping Phase:

Example 1:

Hi ,

Thank you for contacting Microsoft Support. My name is <Name>. I am the Support Professional who will be working with you on this Service Request. You may reach me using the contact information listed below, referencing the SR number <113111612345678>.

I want to take this opportunity to start addressing the issue symptom.

Based on your description, we understand that you had an unexpected restart or shutdown of your VM under deployments. XYZ but that at this point the machine is up and running. Can you confirm if this is correct?

…

Example 2:

Hi ,

[ Issue Definition: ]

[ Scope: ]

[ Findings: ]

If you have any questions or concerns, please do let me know.

- IMPORTANT: Combine IR and Scoping to ensure every communication adds value.
- Engineer performs initial response and scopes problem in the same communication following the guidance below:
  - Check for necessary SR data
    - Customer contact information, including preferred method of contact, time zone of availability, and ideal time for call-back
    - Impacted Azure service
    - Problem description (error message, screenshots, other technology-specific identifiers)
    - Impact timestamp
    - Issue status (issue still occurring?)
    - Business impact (severity)
  - Perform Initial Research if necessary SR data is available (no longer than ~15 minutes), otherwise, contact customer and collect the data. If customer is unavailable, can you look up and verify anything with the limited information provided?
    1. Assess if relevant
    2. Review case description
    3. Review/validate data
    4. LIMITED research (don’t slide into troubleshooting)
    5. Contact TA/mentor if necessary
  - Scope (based on available data and initial research) and deliver initial response with below four elements
    1. Problem Description
    2. Business Impact (see Business Impact below for more guidance)
    3. Scope agreement (set and manage expectations; end goal)
    4. Next actions (could be implement solution)
- Below are examples which can be customized for combined Initial Response and Scope e-mail.
- Note: a good practice for engineers is to explicitly state work schedule and direct customer to Azure Support Backup <azbackup@microsoft.com> in the event continued support is needed and engineer is off shift. Additionally, it is recommended to verify customer's preferred work hours to optimize interaction.
  - See End of Shift / Working Global English 24x7 Issues below for more details.
- IMPORTANT: Once enough detail has been collected (either from the Customer Verbatim or request for additional information), complete the actual [Asset 4497459]case scope by removing the customer's e-mail and casemail addresses and adding your own email address to prevent another communication to customer (one e-mail for combined IR and Scope).
  - DO NOT change customer's Preferred Contact Method to Web as a workaround since "Web Communication" cannot be Set Customer Non-Viewable.
- See 2936135: Azure Support - IaaS PODS IR and Scope Emails for more details.
- Key Points
  - Collect Standard set of Initial SR Data
  - Consistent 5 steps in Initial Research
  - Do not spend more than ~15 minutes doing Initial Research
  - Consistent 4 elements of Scoping Email
  - Scoping Email is also the IR Email
  - TA will be alerted on cases that have not been scoped (communicated, but not necessarily confirmed by customer) within time of case creation + 2 hours
  - DO NOT focus on IR SLA if it means low-value customer communications

Business Impact:

Based on your description, we understand <paraphrase customer's problem/symptom description>

How is this issue affecting the daily operation of your business?

- Communication guidance to understanding impact:
- Reiterate issue for understanding. Do not just copy/paste customer verbatim. For example:
- Ask discovery questions to understand business impact (without just asking what is the business impact!) For example:
- Depending on the issue include some probing questions or once customer responds, you can further the conversation with more questions using the following guidance:
  - Is this a Production or Non – Production server/VM ?
  - Infrastructure server? Login/Replication
  - Applications Server ? HR/Accounting/Point of Sales/New project
  - Are you or your organization on a tight deadline?
  - Do you have any SLA’s with internal / External customers?
  - Recurrence speaking, has it happened before?
  - If there are other pain points I need to be aware of? e.g. my CEO is looking at issue…

TROUBLESHOOTING

Diagnostic Data:

- When data from customer's Azure resources (e.g. Guest OS logs) is needed for analysis, always obtain customer consent to collect any data and document the customer’s consent in a manner that is easily discoverable.
- See also 3122733: How Customers Can Consent to Collection of Diagnostic Data from Azure Portal
- Any data collected will be handled in accordance with the terms of the Azure customer agreement, as well as our Online Services Privacy Statement.

Troubleshoot Problem:

- Troubleshoot problem and offer solution
  - Communicate solution to customer in the form of a step-by-step action plan.
  - Document solution and indicate that solution has been offered in [Asset 4497459]case.
- Key Points
  - TA will be alerted on cases that do not have Solution [Asset 4498193] completed (communicated, but not necessarily confirmed by customer) within time of case creation + 8 hours
  - If no Solution offered is due to waiting on customer data, TA/Engineer logs a TFS incident (RDTask) to record reasons for troubleshooting delay pending missing customer data
  - Manager will be alerted on cases if no solution offered at case creation + 16 hours taking necessary actions to identify and remove road blocks
  - If no Solution offered at case creation > 24 hours, TA performs a CCE review documenting reason for > 1 day taking necessary actions depending on the reason
  - TA will perform necessary reviews and take action on cases with no solution offered at case creation + 3 days, + 5 days and every 5 days until solution offered

SR Titling Taxonomy:

For example:

- To help in prioritizing and managing service requests, engineers are required to change the SR title within guideline formats:
- S:SIE|T:H|D:6/25/2015|DEV|A|LSI:1102209|AzureCompute|IaaS:Windows|VMFaulted <TSG 1287659>: Cannot Start VM
- S:CTS|T:C|D:6/25/2015|DEV|C|EEECRI:1120306|AzureCompute|IaaS:Windows|RDPFailError <TSG 1287935>: 4 VMs Lost RDP Connectivity
- S:CUX|T:W|D:6/25/2015|DEV|C|OPSCRI:1146983|AzureCompute|IaaS:Windows|RoleState: Azure VM multiple unexpected shutdowns
- The State in the SR title state needs to match the MSSolve SR Wait State.
- See 2935777: Azure Support- IaaS PODS MSSolve SR Titling Process for further details on states.
  - Use the Azure Case Title Generator Tool to ensure consistency.

Customer Viewable:

- Manage Visibility of Case Data
- By default, Troubleshooting Tasks are customer viewable. Whenever possible, only leave logs pertaining to communication, case progression, status, summary, and action plan updates with customer, viewable to customer.
- Logs that involve data analysis, research and troubleshooting where *internal data, conversations and processes are detailed and discussed MUST NOT BE customer viewable.
  - Uncheck "Customer Viewable After Commit" for the Troubleshooting [Asset 4498193]
  - *Internal data is defined as data and tooling that the customer does not have direct access to.
- MS Policy: Work Environment: All internal communications (emails, memoranda etc.) are intended for internal use only and are not for distribution outside of Microsoft. Where legitimate business needs call for distribution of internal communication outside the company, appropriate authorization should first be obtained.

Case Documentation:

Engineers are encouraged to provide visible daily bulleted, itemized summary logs including action plans for both customer and engineer.

02.11.2013 - UPDATE/ACTION/PLAN

POA.CUSTOMER:

POA.APTS:

Engineers should create logs within the specific action [Asset 4498193] that supports work performed for the [Asset 4498193]. Examples of Troubleshooting [Asset 4498193] Titles with specific actions:

- We want to simplify and provide easy consistency with case documentation. One thing we have to remember is that Azure customers are able to manage their cases online and ideally that is how we should communicate with them so they always know the status of the issue. To achieve this, there should ideally be 3 tasks per [Asset 4498193] but is not the rule.
  1. Solution Task
  2. Troubleshooting [Asset 4498193] with a Task Title: Plan of Action / Next Steps Task
- Example 1
- Example 2
- Example 3
- Action Items for Customer (if applicable)
- Next steps for engineer
1. Troubleshooting [Asset 4498193] with a Task Title: <specific action> Task
- Reviewing logs to determine why VM rebooted
- Reviewing MDS logs
- Reviewing Network traces
- Engaged EEE with CRI#
- Engaged WASU with CRI#
- Reminder: Any Troubleshooting Task wherein *internal data is documented needs to be CUSTOMER NON-VIEWABLE. EEE/CXP (WASU) Engagement [Asset 4498193] is ALWAYS NON-VIEWABLE by customer.
  - *Internal data is defined as data and tooling that the customer does not have direct access to.

Daily Actions:

- Daily case wellness actions include:
  1. Process Adherence: Reference and practice guidelines detailed in 2834722: Azure Support - IaaS PODS Case Management Process
  2. Document, Document, Document: Document in detail the scope/impact, background/symptoms, initial research and troubleshooting performed, next action plan/action required from customer and engineer.
  3. Update Case Titling and SR Wait State Each Day following actions outlined in 2935777: Azure Support - IaaS PODS MSSolve SR Titling Process
    - The State in the SR title state needs to match the MSSolve SR Wait State.
  4. Touch Every Case Every Day: No SR Idle Days > 1 while in the office (you have to enter labor to flip the idle switch) unless customer expectations set otherwise; In general, no case should go idle > 2 days.
  5. No Idle CRIs – Follow Engagement Process & update case notes on current status
    - 2972622: Embedded EE CRI (EEECRI) Process
    - 2869687: WASU (OPS) Engagement Reference and FAQ. Note: There should be no Blocked CRIs (action on engineer) idle > 3 days.
  6. No Aged Cases > 1 day without EEE/TA engagement
  7. Unresponsive Customers (Solution not Offered)
    - Use the established 3 strike process documented in OneStop as a guideline.
    - Prior to invoking the 3rd strike, engage Manager/TA as appropriate.
  8. Engage TA/Manager on any CRIs open > 5 days with no substantial progress made
  9. Call Coding: Accurately update Support Topic, Symptom and Root Cause Classifications
- See 2935776: Azure Support - Azure IaaS PODS Daily Case Wellness Actions for further details.

General Collaboration:

- See OneStop Process and Procedure steps for global collaboration.
  - In short:
    - Use a [Asset 4497459] [Asset 4498193] for collaboration within same PODs, team/technology.
    - Use a [Asset 4497459][Asset 4498193] for collaboration with different PODs, team/technology.

Embedded EE Collaboration:

2972622: Embedded EE CRI (EEECRI) Process

- Follow the steps documented in the process below when assistance is needed to further troubleshoot an issue:
- Always obtain permission from the customer to collect any data / logs for troubleshooting purposes. If any data / logs are collected, please be sure to share the location in the EEE CRI. Obtaining customer's consent for data / logs ahead of time will help EEEs provide faster resolution.

CXP (WASU) Collaboration:

Example:

WASU	Operations Team

- You should always engage and collaborate with SMEs / TAs / EEEs first and based on guidance determine next steps.
- When working with WASU, please avoid the use of the term "backend team" in all customer-facing communications and phone calls. Instead, refer to our contact points by their general business function, which customers can easily understand.
- Avoid use of the word "Escalate" in all customer-facing communications, phone calls, etc. Instead use "Engage", e.g. I am engaging with the operations team; I am engaging my colleagues, etc.
- Use verbiage that demonstrates ownership on our part (for example, "I am working with our Operations team…, instead of "Our Operations Team is working…").
- Please refer to 2869687: Azure Support - WASU (Ops) Engagement Reference for other details.

ASMS Collaboration:

- In the event that ASMS needs technical assistance with a case, they will engage Azure technical teams via a [Asset 4498193] and will participate in any customer facing conversation and should be considered the customer’s point of contact for the case.
  - Any cross group/technology collaboration engagements shall be conducted via [Asset 4498193].
- If ASMS has a case where the billing/subscription issue was resolved and now customer has a technical question, they will either collaborate via a [Asset 4498193] as called out above or create a new case for the technical question and route to the appropriate technical team.

Partner Collaboration:

- See Collaborating with an external partner using Ticket Exchange Platform (TEP)
- See [DIAGRAMS] 3116286: Red Hat and Azure Support Collaboration Process

Bugs:

S:OPS|T:C|D:1/23/2015|DEV|SEV-C|RDBUG:123456|AzureCompute|…

- If you have a case that is associated with a bug please make sure to put that bug number in the MSSolve Case
  - Please review the process here 2869687: Azure Support - WASU (Ops) Engagement Reference
  - In short, do the following:
    1. Click on the Problem
    2. Go the Admin tab
    3. Select "Problem Type" as "Bug"
    4. Select "Bug database" as "RD"
    5. Select "Bug ID" and enter the Bug number
    6. Update the SR Title with RDBug (or CRI) number. For example:
- If you receive a case that has a known bug associated with it you should also be updating the hit count in the CRI.
  - This is critical because this is what the PG and Engineering teams are looking at to prioritize their work
- If there are multiple bugs identified in one SR, determine if bugs are linked or not in TFS. Check with SME/TA/EEE which bug is being tracked. Otherwise, make sure to document in [Asset 4497459] a [Asset 4498193] noting all known bugs related to issue, inform SME/TA/EEE and communicate to rest of team which bug to use.

Case Transfer:

- Case transfer situations can occur at initial case creation, assignment and during the troubleshooting phase.
  - Irrespective of where the case first lands, engineer is expected to take necessary work and actions owning the customer experience.
  - Review the Misroutes section as needed
  - Review Initial Response / Scoping Phase section as needed
- When transferring a case within Pod of same technology (Intra Pod), following the guidance below:
  - Engineer that owns case needs to start transfer process 1 hour before end of shift.
  - If the case can be closed, close the case.
  - If the case is Severity A or requires 24x7 continue work (and customer is available), verify support topic, select appropriate transfer reason and transfer the case with standard SOAP template documented back to the global queue for next available engineer.
    - Once next available engineer is identified, do a warm hand over of the case.
  - If the case is not Severity A or does not require 24x7 (or customer not available) AND customer is not in engineer's local time zone, verify support topic, select appropriate transfer reason and transfer the case with standard SOAP template documented back to the global queue for next available engineer in customer's time zone.
    - Create a Task with commitment.
    - If customer is in engineer's local time zone, DO NOT transfer the case. Engineer owns case until closure.
  - If the case has been set to solution delivered but not confirmed (SD-Pending Confirmation), the case should not be transferred. Instead, if customer is not ready, follow up is needed or is unresponsive, etc. set expectation with the customer to utilize Azure Support Backup <azbackup@microsoft.com> with a planned follow up date, if there is some immediate action needed, utilize tasks per normal process (intra-pod).
- Please refer to 2912580: Azure Support - IaaS PODS Case Transfer Process for more details.
- Key Points
  - Case transfers are based on criticality and customer availability.
  - Case transfers happen within established windows (1h before end of shift) and consistent standard (SOAP / Transfer Template).
  - Case transfers are only applicable for English-speaking customers.
  - Ensure Engineer’s signature or OOF message points to a valid DL, for customer to reach, in case of assistance needed outside Engineers work hours
  - Customer centric view - Case should "flow" to Customer's time zone.
  - Cases pending Research / CRI / RCA with 24/7 flag not set should not be transferred.
  - Collaboration between pods of different technologies or outside of Pods is done using a [Asset 4498193].
  - Engineer delivering/confirming the solution closes the case, even if customer is from another time zone.

End of Shift / Working Global English 24x7 Issues:

- Azure Support on call is on an exception basis, not the rule.
- Please refer to 2912580: Azure Support - IaaS PODS Case Transfer Process for more details on end of shift and 24x7 issues.
- Also refer to Use Commitments when transferring scheduled Callbacks and Warm Handovers for use of [Asset 4497459] [Asset 4498193] Commitments.

Case Management when OOF:

Review 2912580: Azure Support - IaaS PODS Case Transfer Process and your responsibilities when going OOF which are:

Engineers going OOF for less than a week / normal OOF:
- Reach out to the customer based on “Preferred Contact Method” informing of OOF and use of Azure Support Backup <azbackup@microsoft.com> if customer cannot wait till you resume shift again.
- Ensure complete documentation of any troubleshooting, action plans, etc.
- For active cases, communicate with Manager/TA to load balance cases and/or determine best course of action.
Engineers going OOF more than a week:
- Reach out to the customer based on “Preferred Contact Method” informing of extended OOF and use of Azure Support Backup <azbackup@microsoft.com>.
- Ensure complete documentation of any troubleshooting, action plans, etc.
- Send mail of case summary at least 2 days prior to the start of vacation to your Manager/TA on active/pending so cases can be load balanced across team and/or to determine best course of action.

SOLUTION

Solution [Asset 4498193] / DTS:

IMPORTANT: SIE cases are excluded from DTS because as part of the process are not scoped, therefor there is no Solution [Asset 4498193] (DTS=null). See 2834002: Azure Support - Handling Service Impacting Event (SIE) Outages for more details.

A complete Solution [Asset 4498193] not only adds value and increases CPE for customers, but also can be invaluable for other engineers to utilize as a reference for related issues. Solution Tasks should be marked "Customer Viewable" whenever possible.

SOLUTION TITLE:

Solution title should be describe the customer's issue with the fewest words possible.

SYMPTOM:

The symptom section should include an updated Issue Definition from the case scope.

CAUSE:

The cause section should contain as much detail as possible describing what caused the issue to occur.

RESOLUION:

The resolution section should describe all possible resolutions and or workarounds in detail.

ADDITIONAL INFORMATION / SUGGESTIONS

The additional information / suggestions section should include any applicable resources such as Knowledge Base, MSDN, Blog articles and etc. This section is also a good place to provide value by including summaries of additional questions / concerns brought up during the incident.

- We need to drive down our DTS (Days to Solution) pure and simple. The reason Days to Solution is in BOLD is to drive your awareness that the focus is on Solution not Days to Close or (DTC). Many of you are actually delivering "Solutions" to our customers but neglecting to complete the Solution Task which is driving DTS up giving the appearance we are not meeting our DTS target goal of < 1 days.
- Any time you are providing the customer with a potential solution to their problem such as but not limited to:
  - Mitigation
  - Workarounds
  - Identification of a Bug
  - RCA’s
  - Action Plan/Troubleshooting Steps to be followed should issue recur
- For each of these potential solutions, you should be completing the Solution Task in your cases to stop the DTS clock AND changing the SR Wait State to SD-Pending Confirmation.
  - Follow up if offered solution has not been confirmed within 24 hours and determine reason:
    - Was it because Solution Rejected?
      - Return to Troubleshooting
    - Was it because Waiting on Customer to test?
      - Proactively assist customer with testing
    - Was it because Customer has not responded?
      - Follow up to ask if solution worked and suggest closing with simple instructions to reopen case.
    - Was it because Customer requests more time to confirm?
      - Can be situational, request manager assistance as needed.
- That being said, DO NOT complete the Solution Task if in fact you haven’t provided a solution to the customer.
- When customer indicates given solution has indeed mitigated/resolved the issue, change the SR Wait State to SD-Solution Confirmed and proceed with case closure.
- In summary, complete the Solution Task upon providing a solution to the customer. If the customer replies stating your solution didn’t resolve/mitigate their issue, create a new Solution Task so DTS clock starts again.
- SOLUTION TASKS (Customer Viewable):
- Key Points
  - Proactively follow up if have not heard from customer 24 hours after solution offered
  - Proactively work with customer to confirm solution

Administrative Phase:

- When there are no further actions needed and case is ready to close but customer is not willing for whatever reason, be sure to close out the [Asset 4498193] to put the case into an Administrative Phase while you continue to work with customer (and TA/Manager) to come to an agreeable conclusion.
- A case waiting on a hotfix where there are no further actions that can be taken by the engineer and customer refuses to close the case until hotfix is available should be placed in Administrative Phase.

CLOSING

Classification:

Note: Wherever applicable, more than one Symptom / Cause can be added per SR.

- Record all classifications.
- Ensure accurate coding of Support Topic, Symptom and Root Cause classifications (the latter two can be performed at any phase of the case once scoped).
- IMPORTANT: Accurately call coding root cause is especially important as this data is used by CSS and PG to identify improvements to both tooling and the platform (e.g. CSS Asks in Azure planning cycle) to drive down DTS/MPI. DO NOT CODE TO 'OTHER'!

SR Closure Phase:

Whenever possible, use the established 3 strike process documented in OneStop as a guideline.

I just wanted to follow up if you had a chance to go over <findings, actions and/or resolution>. I will go ahead and leave this case open till end of business day <tomorrow, following day, next week> and proceed with case closure. If this action is incorrect, please do feel free to contact me and I can reopen the case and resume working with you.

Hi ,

It was my pleasure to assist you with your <Issue Description> issue. I hope that you were delighted with the service provided to you. I am providing you with a summary of the key points of the case for your records. If you have any questions please feel free to contact me. My contact information is below.

ISSUE DESCRIPTION

CAUSE ANALYSIS

SOLUTION / WORKAROUND

ADDITIONAL INFORMATION / SUGGESTIONS

Thank you for contacting Microsoft and for choosing Microsoft Azure.

- Close case as soon as possible with appropriate solution within 24 hours of customer confirming solution.
- Verify all labor is recorded.
- Unresponsive Customers (Solution not Offered):
- Once it is recognized a customer is unresponsive after communication of findings, actions, a follow up communication is made.
- Engage a Manager/TA to call the customer (call down) prior to invoking the 3rd Strike.
- If no response after 3rd Strike (or otherwise directed by Manager/TA), a closing email setting expectation that case will be closed by a certain time but can be reopened once customer is available … is sent. Case is then closed. For example:
- Closing Emails (Customer Viewable):
  With a Solution [Asset 4498193] fully documented, the closing e-mail should be a simple copy / paste. Below is a an example which can be utilized for closing e-mails:
- Key Points
  - Engineer to close case ASAP after customer confirms solution (or CSS process indicates closure as next step)
  - Record labor and all classifications (Support Topic, Symptom, Root Cause)
  - Manager to follow up if case open 24 hours past “SD Confirmed”

MORE INFORMATION

- Also reference Azure IaaS POD Case Handling Process (DRAFT) (TO BE ARCHIVED)
- See 3095136: Azure Support - POD Mail Distribution Groups
- Azure Pod: Lean Workshop Process Changes
- Get-Sub: Use Customer Case History to Identify Potentially at Risk Customers

KEYWORDS

aptsallupprocess watsallupprocess aptsbestpractices iaaspod case handling casehandling

↑ Back to the top

Keywords: kb

↑ Back to the top

Article Info

Article ID	:	2834722
Revision	:	8
Created on	:	4/16/2019
Published on	:	5/1/2019
Exists online	:	False
Views	:	213

Microsoft KB Archive Search

Azure Support - IaaS PODS Case Management Process