Prévia do material em texto
18/05/2023, 11:16 Monitoring Azure OpenAI Service - Azure Cognitive Services | Microsoft Learn https://learn.microsoft.com/en-us/azure/cognitive-services/openai/how-to/monitoring 1/5 Monitoring Azure OpenAI Service Article • 03/14/2023 When you have critical applications and business processes relying on Azure resources, you want to monitor those resources for their availability, performance, and operation. This article describes the monitoring data generated by Azure OpenAI Service. Azure OpenAI is part of Cognitive Services, which uses Azure Monitor. If you're unfamiliar with the features of Azure Monitor common to all Azure services that use it, read Monitoring Azure resources with Azure Monitor. Azure OpenAI collects the same kinds of monitoring data as other Azure resources that are described in Monitoring data from Azure resources. Platform metrics and the Activity log are collected and stored automatically, but can be routed to other locations by using a diagnostic setting. Resource Logs aren't collected and stored until you create a diagnostic setting and route them to one or more locations. See Create diagnostic setting to collect platform logs and metrics in Azure for the detailed process for creating a diagnostic setting using the Azure portal, CLI, or PowerShell. When you create a diagnostic setting, you specify which categories of logs to collect. Keep in mind that using diagnostic settings and sending data to Azure Monitor Logs has additional costs associated with it. To understand more, consult the Azure Monitor cost calculation guide. The metrics and logs you can collect are discussed in the following sections. You can analyze metrics for Azure OpenAI by opening Metrics which can be found underneath the Monitoring section when viewing your Azure OpenAI resource in the Azure portal. See Getting started with Azure Metrics Explorer for details on using this tool. Azure OpenAI is a part of Cognitive Services. For a list of all platform metrics collected for Cognitive Services and Azure OpenAI, see Cognitive Services supported metrics. For the current subset of metrics available in Azure OpenAI: Monitoring data Collection and routing Analyzing metrics Azure OpenAI Metrics https://learn.microsoft.com/en-us/azure/azure-monitor/overview https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/monitor-azure-resource https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/monitor-azure-resource#monitoring-data-from-azure-resources https://learn.microsoft.com/en-us/azure/azure-monitor/platform/diagnostic-settings https://learn.microsoft.com/en-us/azure/azure-monitor/logs/cost-logs https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/metrics-getting-started https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/metrics-supported#microsoftcognitiveservicesaccounts 18/05/2023, 11:16 Monitoring Azure OpenAI Service - Azure Cognitive Services | Microsoft Learn https://learn.microsoft.com/en-us/azure/cognitive-services/openai/how-to/monitoring 2/5 Metric Exportable via Diagnostic Settings? Metric Display Name Unit Aggregation Type Description Dimensions BlockedCalls Yes Blocked Calls Count Total Number of calls that exceeded rate or quota limit. ApiName, OperationName, Region, RatelimitKey ClientErrors Yes Client Errors Count Total Number of calls with client side error (HTTP response code 4xx). ApiName, OperationName, Region, RatelimitKey DataIn Yes Data In Bytes Total Size of incoming data in bytes. ApiName, OperationName, Region DataOut Yes Data Out Bytes Total Size of outgoing data in bytes. ApiName, OperationName, Region FineTunedTrainingHours Yes Processed FineTuned Training Hours Count Total Number of Training Hours Processed on an OpenAI FineTuned Model ApiName, ModelDeploymentName, FeatureName, UsageChannel, Region Latency Yes Latency MilliSeconds Average Latency in milliseconds. ApiName, OperationName, Region, RatelimitKey Ratelimit Yes Ratelimit Count Total The current ratelimit of the ratelimit key. Region, RatelimitKey ServerErrors Yes Server Errors Count Total Number of calls with service internal error (HTTP response code 5xx). ApiName, OperationName, Region, RatelimitKey SuccessfulCalls Yes Successful Calls Count Total Number of successful calls. ApiName, OperationName, Region, RatelimitKey TokenTransaction Yes Processed Inference Count Total Number of Inference ApiName, ModelDeploymentName, 18/05/2023, 11:16 Monitoring Azure OpenAI Service - Azure Cognitive Services | Microsoft Learn https://learn.microsoft.com/en-us/azure/cognitive-services/openai/how-to/monitoring 3/5 Metric Exportable via Diagnostic Settings? Metric Display Name Unit Aggregation Type Description Dimensions Tokens Tokens Processed on an OpenAI Model FeatureName, UsageChannel, Region TotalCalls Yes Total Calls Count Total Total number of calls. ApiName, OperationName, Region, RatelimitKey TotalErrors Yes Total Errors Count Total Total number of calls with error response (HTTP response code 4xx or 5xx). ApiName, OperationName, Region, RatelimitKey Data in Azure Monitor Logs is stored in tables where each table has its own set of unique properties. All resource logs in Azure Monitor have the same fields followed by service-specific fields. The common schema is outlined in Azure Monitor resource log schema. The Activity log is a type of platform log in Azure that provides insight into subscription-level events. You can view it independently or route it to Azure Monitor Logs, where you can do much more complex queries using Log Analytics. For a list of the types of resource logs available for Azure OpenAI and other Cognitive Services, see Resource provider operations for Cognitive Services To explore and get a sense of what type of information is available for your Azure OpenAI resource a useful query to start with once you have deployed a model and sent some completion calls through the playground is as follows: Analyzing logs Kusto queries ) Important When you select Logs from the Azure OpenAI menu, Log Analytics is opened with the query scope set to the current Azure OpenAI resource. This means that log queries will only include data from that resource. If you want to run a query that includes data from other resources or data from other Azure services, select Logs from the Azure Monitor menu. See Log query scope and time range in Azure Monitor Log Analytics for details. https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/resource-logs-schema https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/activity-log https://learn.microsoft.com/en-us/azure/role-based-access-control/resource-provider-operations#microsoftcognitiveservices https://learn.microsoft.com/en-us/azure/azure-monitor/logs/scope 18/05/2023, 11:16 Monitoring Azure OpenAI Service - Azure Cognitive Services | Microsoft Learn https://learn.microsoft.com/en-us/azure/cognitive-services/openai/how-to/monitoring 4/5 Kusto Here we return a sample of 100 entries and are displaying a subset of the available columns of data in the logs. The results are as follows: If you wish to see all available columns of data, you can remove the scoping that is provided by the | project line: Kusto You can also select the arrow next to the table name to view all available columns and associated data types. To examine AzureMetrics run: Kusto AzureDiagnostics | take 100 | project TimeGenerated, _ResourceId, Category,OperationName, DurationMs, ResultSignature, properties_s AzureDiagnostics | take 100 AzureMetrics | take 100 | project TimeGenerated, MetricName, Total, Count, TimeGrain, UnitName https://learn.microsoft.com/en-us/azure/cognitive-services/openai/media/monitoring/kusto-results.png#lightbox 18/05/2023, 11:16 Monitoring Azure OpenAI Service - Azure Cognitive Services | Microsoft Learn https://learn.microsoft.com/en-us/azure/cognitive-services/openai/how-to/monitoring 5/5 Azure Monitor alerts proactively notifyyou when important conditions are found in your monitoring data. They allow you to identify and address issues in your system before your customers notice them. You can set alerts on metrics, logs, and the activity log. Different types of alerts have different benefits and drawbacks. Every organization's alerting needs are going to vary, and will also evolve over time. Generally all alerts should be actionable, with a specific intended response if the alert occurs. If there's no action for someone to take, then it might be something you want to capture in a report, but not in an alert. Some use cases may require alerting anytime certain error conditions exist. But in many environments, it might only be in cases where errors exceed a certain threshold for a period of time where sending an alert is warranted. Errors below certain thresholds can often be evaluated through regular analysis of data in Azure Monitor Logs. As you analyze your log data over time, you may also find that a certain condition not occurring for a long enough period of time might be valuable to track with alerts. Sometimes the absence of an event in a log is just as important a signal as an error. Depending on what type of application you're developing in conjunction with your use of Azure OpenAI, Azure Monitor Application Insights may offer additional monitoring benefits at the application layer. See Monitoring Azure resources with Azure Monitor for details on monitoring Azure resources. Read Understand log searches in Azure Monitor logs. Alerts Next steps https://learn.microsoft.com/en-us/azure/azure-monitor/alerts/alerts-metric-overview https://learn.microsoft.com/en-us/azure/azure-monitor/alerts/alerts-unified-log https://learn.microsoft.com/en-us/azure/azure-monitor/alerts/activity-log-alerts https://learn.microsoft.com/en-us/azure/azure-monitor/overview https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/monitor-azure-resource https://learn.microsoft.com/en-us/azure/azure-monitor/logs/log-query-overview https://learn.microsoft.com/en-us/azure/cognitive-services/openai/media/monitoring/metric-result.png#lightbox