Overview of Monitoring in AzureAzure provides a comprehensive suite of monitoring services that cater to various needs, from infrastructure monitoring to application performance monitoring (APM). These services include: Azure Monitor: A centralized service for collecting, analyzing, and acting on telemetry data generated by Azure resources. It offers insights into the performance and health of applications and infrastructure. Azure Monitor maximizes the availability and performance of your applications by delivering a comprehensive solution for collecting, analyzing, and acting on telemetry from your cloud and on-premises environments. It helps you understand how your applications are performing and proactively identifies issues affecting them and the resources they depend on. You can consider Azure monitor a single "pane of glass" that was once handled by multiple separate services. Using the insights and metrics, you can them implement workflows to integrate with popular tooling in the market. The following diagram gives a high-level view of Azure Monitor. At the center of the diagram are the data stores for metrics and logs, which are the two fundamental types of data use by Azure Monitor. On the left are the sources of monitoring data that populate these data stores. On the right are the different functions that Azure Monitor performs with this collected data such as analysis, alerting, and streaming to external systems.
Figure 1, Azure Monitor Overview (Microsoft documentation)
Azure Application Insights:Application Insights, a feature of Azure Monitor, is an extensible Application Performance Management (APM) service for developers and DevOps professionals. You can use it to monitor live applications, as it's designed to help continuously improve the performance and usability of live applications. Application Insights will automatically detect performance anomalies. It includes powerful analytics tools to help diagnose issues and to understand what users do with an application. It provides deep insights into application telemetry, including requests, dependencies, exceptions, and custom events.
Azure Log Analytics:A scalable data analytics solution that collects and analyzes log and telemetry data from Azure resources and on-premises environments. It enables real-time monitoring, advanced analytics, and proactive alerting.
Figure 3, Azure Log Analytics and Metric Explorer (Microsoft documentation)All data collected by Azure Monitor fits into one of two fundamental types: metrics and logs. Metrics are numerical values that describe some aspect of a system at a particular point in time. They are lightweight and capable of supporting near real-time scenarios. Logs contain different kinds of data organized into records with different sets of properties for each type. Telemetry, such as events and traces, are stored as logs in addition to performance data so that they can all be combined for analysis. Log data collected by Azure Monitor is stored in Log Analytics, which includes a rich query language to quickly retrieve, consolidate, and analyze collected data. You can create and test queries using the Log Analytics page in the Azure portal and then directly examine the data using these tools or save queries for use with visualizations or alert rules.
Configure Instrumentation in an App or ServiceInstrumentation involves adding code to your application or service to collect relevant telemetry data for monitoring and logging purposes. Here are the key steps to configure instrumentation in Azure:
- Choose the Right Tools: Depending on your requirements, select appropriate Azure monitoring services such as Azure Monitor, Application Insights, or Log Analytics.
- Instrumentation SDKs: Utilize SDKs provided by Azure for your programming language and platform to integrate monitoring capabilities directly into your application code.
- Define Metrics and Logging: Identify the critical metrics, logs, and events for monitoring and troubleshooting your application. Ensure proper logging of exceptions, performance metrics, and custom events.
- Configure Telemetry Sources: Instrument your application to emit telemetry data to Azure monitoring services using APIs or SDKs. This includes logging events, capturing performance metrics, and tracing dependencies.
- Set up Alerts and Notifications: Define alert rules based on key performance indicators and thresholds to receive proactive notifications via email, SMS, or other channels.
Analyzing and Troubleshooting AppsAzure Monitor can collect data from a variety of sources. You can think of monitoring data for your applications in tiers ranging from your application, any operating system, and services it relies on down to the platform itself. Azure Monitor collects data from each of the following tiers:
- Application monitoring data: Data about the performance and functionality of the code you have written, regardless of its platform.
- Guest OS monitoring data: Data about the operating system on which your application is running. This could be running in Azure, another cloud, or on-premises.
- Azure resource monitoring data: Data about the operation of an Azure resource.
- Azure subscription monitoring data: Data about the operation and management of an Azure subscription and data about the health and operation of Azure itself.
- Azure tenant monitoring data: Data about the operation of tenant-level Azure services, such as Azure Active Directory.
Figure 4, Azure Monitor Collect (Microsoft documentation)
Azure Data SourcesMonitoring data in Azure comes from various sources that can be organized into tiers, the highest tiers being your application and any operating systems and the lower tiers being components of the Azure platform.
- Azure tenant:Telemetry related to your Azure tenant is collected from tenant-wide services such as Azure Active Directory.
- Azure platform:Telemetry related to the health and operation of Azure itself includes data about the operation and management of your Azure subscription. It includes service-health data stored in the Azure Activity log and audit logs from the Azure Active Directory.
- Guest operating system:Compute resources in Azure and other clouds, and on-premises have a guest operating system to monitor. Installing one or more agents allows you to gather telemetry from the guest into the same monitoring tools as the Azure services themselves.
- Applications:In addition to telemetry that your application may write to the guest operating system, detailed application monitoring is done with Application Insights. Application Insights can collect data from applications running on a variety of platforms. The application can be running in Azure, another cloud, or on-premises.
- Custom sources:Azure Monitor can collect log data from any REST client using the Data Collector API. This allows you to create custom monitoring scenarios and extend monitoring to resources that don't expose telemetry through other sources.
Figure 5, Azure Data Sources (Microsoft documentation)
Azure Monitor Sources
All data collected by Azure Monitor fits into one of two fundamental types, metrics and logs.
Metrics are numerical values that describe some aspect of a system at a particular point in time. They are lightweight and capable of supporting near real-time scenarios.
Logs contain different kinds of data organized into records with different sets of properties for each type. Telemetry such as events and traces are stored as logs in addition to performance data so that it can all be combined for analysis.
Figure 6, Azure Monitor Sources (Microsoft documentation)Analyzing and troubleshooting applications in Azure involves leveraging the rich monitoring and logging data collected by Azure services. Here's how to effectively analyze and troubleshoot apps:
- Dashboard and Visualization: Create customized dashboards in Azure Monitor to visualize key metrics and performance indicators. Use charts, graphs, and tables to gain insights into the health and performance of your application.
- Query and Analytics: Use query languages such as KQL (Kusto Query Language) to analyze log and telemetry data in Azure Log Analytics. Write queries to identify trends, anomalies, and root causes of issues.
- Alerts and Notifications: Monitor alert notifications generated by Azure Monitor and Application Insights. Investigate and respond to alerts promptly to mitigate potential issues before they impact users.
- Diagnostic Tools: Leverage diagnostic tools provided by Azure services, such as Application Insights Profiler and Azure Diagnostics, to diagnose performance bottlenecks, memory leaks, and other issues.
Implement Code That Handles Transient FaultsTransient faults are temporary errors in distributed systems due to network issues, resource constraints, or transient failures of dependent services. An application that communicates with elements running in the cloud must be sensitive to the transient faults in this environment. Faults include the momentary loss of network connectivity to components and services, the temporary unavailability of a service, or timeouts that occur when a service is busy. These faults are typically self-correcting, and if the action that triggered a fault is repeated after a suitable delay, it's likely to be successful. For example, a database service that's processing a large number of concurrent requests can implement a throttling strategy that temporarily rejects any further requests until its workload has eased. An application trying to access the database might fail to connect, but it might succeed if it tries again after a delay. Here's how to implement code that handles transient faults effectively:
- Retry Policies: Implement retry logic in your code using resilient patterns such as exponential backoff, jitter, and circuit breaker. Use libraries like Polly for .NET or resilient libraries provided by Azure SDKs.
- Transient Fault Handling: Identify critical operations prone to transient faults, such as network requests and database queries. Wrap these operations with retry policies to automatically retry in case of transient failures
Figure 7, Handling transient error in code (Microsoft documentation)
Figure 8, Detecting transient error in code (Microsoft documentation)
- Error Handling and Logging: Properly handle and log transient faults in your application code. Log meaningful error messages, exception details, and telemetry data to aid in troubleshooting and root cause analysis.
- Monitor Retry Attempts: Monitor and track retry attempts and transient fault occurrences using Azure monitoring services. Set up alerts and notifications to be notified of excessive retries or persistent transient faults.
Best PracticesTo ensure effective monitoring and logging in Azure, consider the following best practices:
- Start Early: Incorporate monitoring and instrumentation into your application design and development process from the outset.
- Granular Monitoring: Collect granular telemetry data to gain deep insights into the performance and behavior of your applications and services.
- Automate Monitoring: Automate the configuration and deployment of monitoring solutions using infrastructure-as-code tools like Azure Resource Manager (ARM) templates or Azure CLI.
- Continuous Improvement: Continuously review and refine your monitoring strategy based on evolving application requirements and changing business needs.
- Security and Compliance: Ensure that your monitoring and logging solutions adhere to security and compliance standards, such as GDPR and SOC 2.