Back to skills
SkillHub ClubShip Full StackFull Stack

azure-messaging

Troubleshoot and resolve issues with Azure Messaging SDKs for Event Hubs and Service Bus. Covers connection failures, authentication errors, message processing issues, and SDK configuration problems. WHEN: event hub SDK error, service bus SDK issue, messaging connection failure, AMQP error, event processor host issue, message lock lost, send timeout, receiver disconnected, SDK troubleshooting, azure messaging SDK, event hub consumer, service bus queue issue, topic subscription error, enable logging event hub, service bus logging, eventhub python, servicebus java, eventhub javascript, servicebus dotnet, event hub checkpoint, event hub not receiving messages, service bus dead letter.

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars
423
Hot score
99
Updated
March 20, 2026
Overall rating
C3.5
Composite score
3.5
Best-practice grade
C62.8

Install command

npx @skill-hub/cli install microsoft-azure-skills-azure-messaging

Repository

microsoft/azure-skills

Skill path: .github/plugins/azure-skills/skills/azure-messaging

Troubleshoot and resolve issues with Azure Messaging SDKs for Event Hubs and Service Bus. Covers connection failures, authentication errors, message processing issues, and SDK configuration problems. WHEN: event hub SDK error, service bus SDK issue, messaging connection failure, AMQP error, event processor host issue, message lock lost, send timeout, receiver disconnected, SDK troubleshooting, azure messaging SDK, event hub consumer, service bus queue issue, topic subscription error, enable logging event hub, service bus logging, eventhub python, servicebus java, eventhub javascript, servicebus dotnet, event hub checkpoint, event hub not receiving messages, service bus dead letter.

Open repository

Best for

Primary workflow: Ship Full Stack.

Technical facets: Full Stack.

Target audience: everyone.

License: MIT.

Original source

Catalog source: SkillHub Club.

Repository owner: microsoft.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

  • Install azure-messaging into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
  • Review https://github.com/microsoft/azure-skills before adding azure-messaging to shared team environments
  • Use azure-messaging for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: azure-messaging
description: "Troubleshoot and resolve issues with Azure Messaging SDKs for Event Hubs and Service Bus. Covers connection failures, authentication errors, message processing issues, and SDK configuration problems. WHEN: event hub SDK error, service bus SDK issue, messaging connection failure, AMQP error, event processor host issue, message lock lost, send timeout, receiver disconnected, SDK troubleshooting, azure messaging SDK, event hub consumer, service bus queue issue, topic subscription error, enable logging event hub, service bus logging, eventhub python, servicebus java, eventhub javascript, servicebus dotnet, event hub checkpoint, event hub not receiving messages, service bus dead letter."
license: MIT
metadata:
  author: Microsoft
  version: "1.0.2"
---

# Azure Messaging SDK Troubleshooting

## Quick Reference

| Property | Value |
|----------|-------|
| **Services** | Azure Event Hubs, Azure Service Bus |
| **MCP Tools** | `mcp_azure_mcp_eventhubs`, `mcp_azure_mcp_servicebus` |
| **Best For** | Diagnosing SDK connection, auth, and message processing issues |

## When to Use This Skill

- SDK connection failures, auth errors, or AMQP link errors
- Message lock lost, session lock, or send/receive timeouts
- Event processor or message handler stops processing
- SDK configuration questions (retry, prefetch, batch size)

## MCP Tools

| Tool | Command | Use |
|------|---------|-----|
| `mcp_azure_mcp_eventhubs` | Namespace/hub ops | List namespaces, hubs, consumer groups |
| `mcp_azure_mcp_servicebus` | Queue/topic ops | List namespaces, queues, topics, subscriptions |
| `mcp_azure_mcp_monitor` | `logs_query` | Query diagnostic logs with KQL |
| `mcp_azure_mcp_resourcehealth` | `get` | Check service health status |
| `mcp_azure_mcp_documentation` | Doc search | Search Microsoft Learn for troubleshooting docs |

## Diagnosis Workflow

1. **Identify the SDK and version** — Ask which language SDK and version the user is on
2. **Check resource health** — Use `mcp_azure_mcp_resourcehealth` to verify the namespace is healthy
3. **Review the error message** — Match against language-specific troubleshooting guide
4. **Look up documentation** — Use `mcp_azure_mcp_documentation` to search Microsoft Learn for the error or topic
5. **Check configuration** — Verify connection string, entity name, consumer group
6. **Recommend fix** — Apply remediation, citing documentation found


## Connectivity Troubleshooting

See [Service Troubleshooting Guide](references/service-troubleshooting.md) for ports, WebSocket fallback, IP firewall, private endpoints, and service tags.

## SDK Troubleshooting Guides

- **Event Hubs**: [Python](references/sdk/azure-eventhubs-py.md) | [Java](references/sdk/azure-eventhubs-java.md) | [JS](references/sdk/azure-eventhubs-js.md) | [.NET](references/sdk/azure-eventhubs-dotnet.md)
- **Service Bus**: [Python](references/sdk/azure-servicebus-py.md) | [Java](references/sdk/azure-servicebus-java.md) | [JS](references/sdk/azure-servicebus-js.md) | [.NET](references/sdk/azure-servicebus-dotnet.md)

## References

Use `mcp_azure_mcp_documentation` to search Microsoft Learn for latest guidance. See [Service Troubleshooting Guide](references/service-troubleshooting.md) for network and service-level docs.

---

## Referenced Files

> The following files are referenced in this skill and included for context.

### references/service-troubleshooting.md

```markdown
# Service-Level Troubleshooting

Covers connectivity, firewall, and network issues that apply regardless of SDK language.

## Permanent Connectivity Issues

If the client **cannot connect at all**:

1. **Verify connection string** — Get from Azure portal. For **Event Hubs (Kafka endpoint)** clients, also check `producer.config` / `consumer.config`.
2. **Check service outage** — [Azure status page](https://azure.status.microsoft/status)
3. **Firewall / ports** — Open AMQP 5671 and 5672, HTTPS 443. For **Event Hubs (Kafka endpoint)** only, also open Kafka 9093. Use WebSockets (port 443) as fallback.
4. **IP firewall** — If enabled on namespace, ensure client IP is allowed.
5. **VNet / private endpoints** — Confirm app runs in correct subnet. Check service endpoint and NSG rules.
6. **Proxy / SSL** — Intercepting proxies can cause SSL handshake failures. Test with proxy disabled.

### Quick Connectivity Test

```bash
# Test endpoint reachability (expect Atom feed XML on success)
curl -v https://<namespace>.servicebus.windows.net/

# Resolve namespace IP
nslookup <namespace>.servicebus.windows.net
```

## Transient Connectivity Issues

If connectivity is **intermittent**:

1. **Upgrade SDK** — Use latest version; transient issues may already be fixed.
2. **Check dropped packets** — `netstat -s` (Linux) or `netsh interface ipv4 show subinterface` (Windows).
3. **Capture network traces** — Use Wireshark or `tcpdump` filtered on namespace IP.
4. **Idle disconnect** — Service disconnects idle AMQP connections. Clients auto-reconnect; this is expected.

## WebSocket Configuration by Language

| Language | Setting |
|----------|---------|
| .NET | `EventHubsTransportType.AmqpWebSockets` / `ServiceBusTransportType.AmqpWebSockets` |
| Java | `AmqpTransportType.AMQP_WEB_SOCKETS` |
| Python | `transport_type=TransportType.AmqpOverWebsocket` |
| JavaScript | `webSocketOptions` in client constructor |

## Authentication Checklist

| Issue | Fix |
|-------|-----|
| Invalid connection string | Re-copy from Azure portal |
| Expired SAS token | Regenerate or increase validity |
| Missing RBAC role | Assign the corresponding *Azure Event Hubs Data Owner/Sender/Receiver* or *Azure Service Bus Data Owner/Sender/Receiver* role |
| Managed Identity not configured | Enable system/user-assigned identity, assign role on namespace |

## Sender Issues (All Languages)

- **Batch >1MB fails** — Service rejects batches over 1MB even with Premium large message support. Send large messages individually.
- **Multiple partition keys in batch** — Not allowed. Group messages by `partitionKey` (or `sessionId`) into separate batches.

## Receiver Issues (All Languages)

- **Batch receive returns fewer messages** — After the first message arrives, the receiver waits briefly (20ms–1s depending on SDK) for more. `maxWaitTime` only controls the wait for the *first* message.
- **Lock lost before expiry** — Can occur on AMQP link detach (transient network or 10-min idle timeout), not only when processing exceeds lock duration.
- **Socket exhaustion** — Treat clients as singletons. Each new client creates a new AMQP connection. Always close/dispose clients when done.

## Further Reading

- [Event Hubs troubleshooting guide](https://learn.microsoft.com/azure/event-hubs/troubleshooting-guide)
- [Service Bus troubleshooting guide](https://learn.microsoft.com/azure/service-bus-messaging/service-bus-troubleshooting-guide)
- [Event Hubs quotas and limits](https://learn.microsoft.com/azure/event-hubs/event-hubs-quotas)
- [Service Bus quotas and limits](https://learn.microsoft.com/azure/service-bus-messaging/service-bus-quotas)
- [Event Hubs AMQP troubleshooting](https://learn.microsoft.com/azure/event-hubs/event-hubs-amqp-troubleshoot)
- [Service Bus AMQP troubleshooting](https://learn.microsoft.com/azure/service-bus-messaging/service-bus-amqp-troubleshoot)
- [Event Hubs IP addresses and service tags](https://learn.microsoft.com/azure/event-hubs/troubleshooting-guide#what-ip-addresses-do-i-need-to-allow)
- [Service Bus IP addresses](https://learn.microsoft.com/azure/service-bus-messaging/service-bus-faq#what-ip-addresses-do-i-need-to-add-to-allowlist-)

```

### references/sdk/azure-eventhubs-py.md

```markdown
# Azure Event Hubs SDK — Python

Package: `azure-eventhub` | [README](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/eventhub/azure-eventhub) | [Full Troubleshooting Guide](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/eventhub/azure-eventhub/TROUBLESHOOTING.md)

## Common Errors

| Exception | Cause | Fix |
|-----------|-------|-----|
| `EventHubError` | Base exception wrapping AMQP errors | Check `message`, `error`, `details` fields |
| `ConnectionLostError` | Idle connection disconnected | Auto-recovers on next operation; no action needed |
| `AuthenticationError` | Bad credentials or expired SAS | Regenerate key, check RBAC roles, verify connection string |
| `OperationTimeoutError` | Network or throttling | Check firewall, try WebSockets (port 443), increase timeout |

## Retry Configuration

> **Auth:** `DefaultAzureCredential` is for local development. See [auth-best-practices.md](../auth-best-practices.md) for production patterns.

```python
from azure.eventhub import EventHubProducerClient
from azure.identity import DefaultAzureCredential

client = EventHubProducerClient(
    fully_qualified_namespace="<your-namespace>.servicebus.windows.net",
    eventhub_name="<your-eventhub>",
    credential=DefaultAzureCredential(),
    retry_total=3,
    retry_backoff_factor=0.8,
    retry_backoff_max=120,
    retry_mode='exponential'
)
```

## Enable Logging

```python
import logging, sys

handler = logging.StreamHandler(stream=sys.stdout)
handler.setFormatter(logging.Formatter("%(asctime)s | %(threadName)s | %(levelname)s | %(name)s | %(message)s"))
logger = logging.getLogger('azure.eventhub')
logger.setLevel(logging.DEBUG)
logger.addHandler(handler)

# Enable AMQP frame tracing
client = EventHubProducerClient(..., logging_enable=True)
```

## Key Issues

- **Buffered producer not sending**: Ensure enough `ThreadPoolExecutor` workers (one per partition). Use `buffer_concurrency` kwarg.
- **Blocking calls in async**: Run CPU-bound code in an executor; blocking the event loop impacts load balancing and checkpointing.
- **Consumer disconnected**: Expected during load balancing. If persistent with no scaling, file an issue.
- **Soft delete on checkpoint store**: Disable "soft delete" and "blob versioning" on the storage account used for checkpointing.
- **Always close clients**: Use `with` statement or call `close()` to avoid socket/connection leaks.

## Checkpointing (BlobCheckpointStore)

Package: `azure-eventhub-checkpointstoreblob` (sync) / `azure-eventhub-checkpointstoreblob-aio` (async)

> **Auth:** `DefaultAzureCredential` is for local development. See [auth-best-practices.md](../auth-best-practices.md) for production patterns.

```python
from azure.eventhub import EventHubConsumerClient
from azure.eventhub.extensions.checkpointstoreblob import BlobCheckpointStore
from azure.identity import DefaultAzureCredential

credential = DefaultAzureCredential()
checkpoint_store = BlobCheckpointStore(
    blob_account_url="https://<storage-account>.blob.core.windows.net",
    container_name="<checkpoint-container>",
    credential=credential
)
client = EventHubConsumerClient(
    fully_qualified_namespace="<your-namespace>.servicebus.windows.net",
    eventhub_name="<your-eventhub>",
    consumer_group="$Default",
    credential=credential,
    checkpoint_store=checkpoint_store
)
```

**Common issues:**
- **Soft delete / blob versioning**: Disable both on the storage account — they cause large delays during load balancing.
- **HTTP 412/409 from storage**: Normal during partition ownership negotiation; not an error.
- **Checkpoint frequency**: Checkpoint after processing each batch, not each event, to avoid storage throttling.

```

### references/sdk/azure-eventhubs-java.md

```markdown
# Azure Event Hubs SDK — Java

Package: `azure-messaging-eventhubs` | [README](https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/eventhubs/azure-messaging-eventhubs/) | [Full Troubleshooting Guide](https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/eventhubs/azure-messaging-eventhubs/TROUBLESHOOTING.md)

> ⚠️ **Note:** The detailed Java troubleshooting guide has moved to [Microsoft Learn](https://learn.microsoft.com/azure/developer/java/sdk/troubleshooting-messaging-event-hubs-overview).

## Common Errors

| Exception | Cause | Fix |
|-----------|-------|-----|
| `AmqpException` (connection:forced) | Idle connection disconnected | Auto-recovers; no action needed |
| `AmqpException` (unauthorized-access) | Bad credentials or missing permissions | Verify connection string, SAS, or RBAC roles |
| `AmqpException` (resource-limit-exceeded) | Too many concurrent receivers | Reduce receiver count or upgrade tier |
| `OperationTimeoutException` | Network issue or throttling | Check firewall, try AMQP over WebSockets (port 443) |

## Enable Logging

Configure via SLF4J. Add `logback-classic` dependency and set level for `com.azure.messaging.eventhubs`:

```xml
<logger name="com.azure.messaging.eventhubs" level="DEBUG"/>
```

For AMQP frame tracing:
```xml
<logger name="com.azure.core.amqp" level="DEBUG"/>
```

See [Java SDK logging docs](https://learn.microsoft.com/azure/developer/java/sdk/troubleshooting-messaging-event-hubs-overview) for details.

## Key Issues

- **High CPU / partition imbalance**: Limit to 1.5–3 partitions per CPU core.
- **Consumer disconnected**: Higher priority consumer took ownership. Expected during load balancing. Persistent issues without scaling indicate a problem.
- **Connection sharing**: Reuse `EventHubClientBuilder` connections; avoid creating new clients per operation.

## Checkpointing (BlobCheckpointStore)

Package: `azure-messaging-eventhubs-checkpointstore-blob`

> **Auth:** `DefaultAzureCredential` is for local development. See [auth-best-practices.md](../auth-best-practices.md) for production patterns.

```java
TokenCredential credential = new DefaultAzureCredentialBuilder().build();

BlobContainerAsyncClient blobClient = new BlobContainerClientBuilder()
    .endpoint("https://<storage-account>.blob.core.windows.net/<checkpoint-container>")
    .credential(credential)
    .buildAsyncClient();

EventProcessorClient processor = new EventProcessorClientBuilder()
    .credential("<your-namespace>.servicebus.windows.net", "<your-eventhub>", credential)
    .consumerGroup("$Default")
    .checkpointStore(new BlobCheckpointStore(blobClient))
    .processEvent(eventContext -> {
        // process event
        eventContext.updateCheckpoint();
    })
    .buildEventProcessorClient();
```

**Common issues:**
- **Soft delete / blob versioning**: Disable both on the storage account — they cause delays during load balancing.
- **HTTP 412/409 from storage**: Normal during partition ownership negotiation; not an error.
- **Checkpoint frequency**: Call `updateCheckpoint()` per batch, not per event, to reduce storage calls.

## Filing Issues

Include: partition count, machine specs, instance count, max heap (`-Xmx`), average `EventData` size, traffic pattern, and DEBUG-level logs (±10 min from issue).

```

### references/sdk/azure-eventhubs-js.md

```markdown
# Azure Event Hubs SDK — JavaScript

Package: `@azure/event-hubs` | [README](https://github.com/Azure/azure-sdk-for-js/blob/main/sdk/eventhub/event-hubs/) | [Full Troubleshooting Guide](https://github.com/Azure/azure-sdk-for-js/blob/main/sdk/eventhub/event-hubs/TROUBLESHOOTING.md)

## Common Errors

| Error | Code | Fix |
|-------|------|-----|
| `MessagingError` (connection:forced) | Idle disconnect | Auto-recovers; no action needed |
| `MessagingError` (Unauthorized) | Bad credentials | Verify connection string, SAS, or RBAC roles |
| `MessagingError` (retryable: true) | Transient issue | Auto-retried per `RetryOptions`. If surfaced, all retries exhausted |

`MessagingError` fields: `name`, `code`, `retryable`, `info`, `address`, `errno`, `port`, `syscall`.

## Enable Logging

```bash
# All SDK logs
export AZURE_LOG_LEVEL=verbose

# Or use DEBUG for granular control
export DEBUG=azure*,rhea*

# Errors only
export DEBUG=azure:*:(error|warning),rhea-promise:error,rhea:events,rhea:frames,rhea:io,rhea:flow
```

Browser:
```javascript
localStorage.debug = "azure:*:info";
```

## Key Issues

- **Socket exhaustion**: Treat clients as singletons. Each new client creates a new AMQP connection/socket. Always call `close()`.
- **412 precondition failures**: Normal during subscription partition ownership negotiation.
- **Partition ownership churn**: Expected when scaling instances. Should stabilize within minutes.
- **High CPU**: Limit to 1.5–3 partitions per CPU core.
- **Subscription stops receiving**: Often a symptom of an underlying race condition during error recovery. File a GitHub issue with DEBUG logs.
- **WebSockets**: Pass `webSocketOptions` to client constructor to connect over port 443.

## Checkpointing (BlobCheckpointStore)

Package: `@azure/eventhubs-checkpointstore-blob`

```javascript
const { BlobCheckpointStore } = require("@azure/eventhubs-checkpointstore-blob");
const { BlobServiceClient } = require("@azure/storage-blob");

const containerClient = new BlobServiceClient(storageEndpoint, credential)
  .getContainerClient("checkpointstore");
const checkpointStore = new BlobCheckpointStore(containerClient);

const consumerClient = new EventHubConsumerClient(
  consumerGroup, fullyQualifiedNamespace, eventHubName, credential, checkpointStore
);
```

**Common issues:**
- **Soft delete / blob versioning**: Disable both on the storage account — they cause delays during load balancing.
- **412 precondition failures**: Normal during partition ownership negotiation; not an error.
- **Checkpoint frequency**: Call `updateCheckpoint()` per batch, not per event, to reduce storage calls.

```

### references/sdk/azure-eventhubs-dotnet.md

```markdown
# Azure Event Hubs SDK — .NET (C#)

Package: `Azure.Messaging.EventHubs` | [README](https://github.com/Azure/azure-sdk-for-net/blob/main/sdk/eventhub/Azure.Messaging.EventHubs/) | [Full Troubleshooting Guide](https://github.com/Azure/azure-sdk-for-net/blob/main/sdk/eventhub/Azure.Messaging.EventHubs/TROUBLESHOOTING.md)

## Common Errors

| Exception | Reason | Fix |
|-----------|--------|-----|
| `EventHubsException` (ServiceTimeout) | Service didn't respond in time | Transient — retried automatically. Verify state if persists |
| `EventHubsException` (QuotaExceeded) | Too many active readers per consumer group | Reduce concurrent receivers or upgrade tier |
| `EventHubsException` (ConsumerDisconnected) | Higher priority consumer took ownership | Expected during load balancing; check if scaling |
| `EventHubsException` (MessageSizeExceeded) | Event too large | Reduce event payload; unlikely in practice since the client caps at the service link limit |
| `UnauthorizedAccessException` | Bad credentials | Verify connection string, SAS token, or RBAC roles |

## Exception Filtering

```csharp
try { /* receive events */ }
catch (EventHubsException ex) when (ex.Reason == EventHubsException.FailureReason.ConsumerDisconnected)
{
    // Handle consumer disconnection
}
```

## Retry Configuration

Configure via `EventHubsRetryOptions` when creating the client. See [Configuring retry thresholds sample](https://github.com/Azure/azure-sdk-for-net/blob/main/sdk/eventhub/Azure.Messaging.EventHubs/samples).

## Key Issues

- **Socket exhaustion**: Treat clients as singletons. Share `EventHubConnection` across clients if needed. Always call `CloseAsync` or `DisposeAsync`.
- **HTTP 412/409 from storage**: Normal during checkpoint store operations — not an error.
- **Partitions closing frequently**: Expected when scaling. If persists >5 min without scaling, investigate. See [Troubleshooting Guide](https://github.com/Azure/azure-sdk-for-net/blob/main/sdk/eventhub/Azure.Messaging.EventHubs/TROUBLESHOOTING.md) for detailed diagnostics.
- **High CPU**: Limit to 1.5–3 partitions per CPU core and test at scale thoroughly if above that threshold.
- **Azure Functions**: After upgrading to v5.0+ extensions, update binding types. Reduce logging noise by filtering `Azure.Messaging.EventHubs` to Warning.
- **WebSockets**: Use `EventHubsTransportType.AmqpWebSockets` to connect over port 443 when AMQP ports (5761, 5762) are blocked.

## Checkpointing (BlobCheckpointStore)

Package: `Azure.Messaging.EventHubs.Processor` (includes `EventProcessorClient` + blob checkpoint store)

> **Auth:** `DefaultAzureCredential` is for local development. See [auth-best-practices.md](../auth-best-practices.md) for production patterns.

```csharp
var credential = new DefaultAzureCredential();

var storageClient = new BlobContainerClient(
    new Uri("https://<storage-account>.blob.core.windows.net/<checkpoint-container>"),
    credential);

var processor = new EventProcessorClient(
    storageClient,
    "$Default",
    "<your-namespace>.servicebus.windows.net",
    "<your-eventhub>",
    credential);

processor.ProcessEventAsync += async (args) =>
{
    // process event
    await args.UpdateCheckpointAsync();
};
```

**Common issues:**
- **Soft delete / blob versioning**: Disable both on the storage account — they cause delays during load balancing.
- **HTTP 412/409 from storage**: Normal during partition ownership negotiation; not an error.
- **Checkpoint frequency**: Call `UpdateCheckpointAsync()` per batch, not per event, to reduce storage calls.

```

### references/sdk/azure-servicebus-py.md

```markdown
# Azure Service Bus SDK — Python

Package: `azure-servicebus` | [README](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/servicebus/azure-servicebus/) | [Full Troubleshooting Guide](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/servicebus/azure-servicebus/TROUBLESHOOTING.md)

## Common Errors

| Exception | Cause | Fix |
|-----------|-------|-----|
| `ServiceBusAuthenticationError` | Invalid credentials | Check connection string, regenerate SAS key |
| `ServiceBusAuthorizationError` | Missing Send/Listen claim | Assign `Azure Service Bus Data Owner/Sender/Receiver` RBAC role |
| `ServiceBusConnectionError` | Network or firewall | Check AMQP port 5671, try `TransportType.AmqpOverWebsocket` |
| `OperationTimeoutError` | Service didn't respond in time | Adjust retry config, verify network |
| `MessageLockLostError` | Processing exceeded lock duration | Use `AutoLockRenewer`, reduce processing time |
| `SessionLockLostError` | Session lock expired | Reconnect to session, keep renewing lock |
| `MessageSizeExceededError` | Message or batch too large | Reduce payload. Premium supports individual messages up to 100MB. Batch limit is computed from max message size on the client, so batches can also be impacted |

## Enable Logging

```python
import logging, sys

handler = logging.StreamHandler(stream=sys.stdout)
handler.setFormatter(logging.Formatter("%(asctime)s | %(threadName)s | %(levelname)s | %(name)s | %(message)s"))
logger = logging.getLogger('azure.servicebus')
logger.setLevel(logging.DEBUG)
logger.addHandler(handler)

# Enable AMQP frame tracing
from azure.servicebus import ServiceBusClient
client = ServiceBusClient(..., logging_enable=True)
```

## AutoLockRenewer

```python
from azure.servicebus import AutoLockRenewer

renewer = AutoLockRenewer()
with receiver:
    for message in receiver.receive_messages(max_message_count=10):
        renewer.register(receiver, message, max_lock_renewal_duration=300)
        # process message
        receiver.complete_message(message)
```

## Key Issues

- **Mixing sync/async**: Don't use `time.sleep()` in async code; use `await asyncio.sleep()`.
- **Dead letter debugging**: Use `sub_queue=ServiceBusSubQueue.DEAD_LETTER` to inspect `dead_letter_reason` and `dead_letter_error_description`.
- **Always close clients**: Use `with` statement or call `close()` to avoid connection leaks.

```

### references/sdk/azure-servicebus-java.md

```markdown
# Azure Service Bus SDK — Java

Package: `azure-messaging-servicebus` | [README](https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/servicebus/azure-messaging-servicebus/) | [Full Troubleshooting Guide](https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/servicebus/azure-messaging-servicebus/TROUBLESHOOTING.md)

## Common Errors

| Exception | Cause | Fix |
|-----------|-------|-----|
| `AmqpException` (unauthorized-access) | Bad credentials or missing permissions | Verify connection string, SAS, or RBAC roles |
| `AmqpException` (connection:forced) | Idle connection or transient network issue | Auto-recovers; no action needed |
| `ServiceBusException` (MESSAGE_LOCK_LOST) | Lock expired during processing | Reduce processing time, disable auto-complete, settle manually |

## Key Issues

### Processor hangs with high prefetch + maxConcurrentCalls

`Update disposition request timed out.` — Client stops processing new messages.

**Cause**: Thread starvation when thread pool size ≤ `maxConcurrentCalls`.

**Fix**:
```bash
# Increase reactor thread pool
-Dreactor.schedulers.defaultBoundedElasticSize=<value greater than concurrency>
```
Also set `prefetchCount(0)` to disable prefetch. This is more frequent in AKS environments.

### Implicit prefetch in ServiceBusReceiverClient

Even with prefetch disabled in the builder, `receiveMessages` API can re-enable prefetch implicitly. See [SyncReceiveAndPrefetch](https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/servicebus/azure-messaging-servicebus/docs/SyncReceiveAndPrefetch.md).

### Autocomplete issues

Autocomplete and auto-lock-renewal have known issues with buffered/prefetched messages.

**Fix**: Use `disableAutoComplete()` and `.maxAutoLockRenewalDuration(Duration.ZERO)`, then settle messages explicitly.

## Enable Logging

Configure via SLF4J:
```xml
<logger name="com.azure.messaging.servicebus" level="DEBUG"/>
```

See [Java SDK logging docs](https://learn.microsoft.com/azure/developer/java/sdk/troubleshooting-messaging-service-bus-overview) for details.

## Filing Issues

Include: namespace tier, entity type/config, machine specs, max heap (`-Xmx`), `maxConcurrentCalls`, `prefetchCount`, autoComplete setting, traffic pattern, and DEBUG-level logs (±10 min from issue).

```

### references/sdk/azure-servicebus-js.md

```markdown
# Azure Service Bus SDK — JavaScript

Package: `@azure/service-bus` | [README](https://github.com/Azure/azure-sdk-for-js/blob/main/sdk/servicebus/service-bus/) | [Full Troubleshooting Guide](https://github.com/Azure/azure-sdk-for-js/blob/main/sdk/servicebus/service-bus/TROUBLESHOOTING.md)

## Common Errors

| Error Code | Cause | Fix |
|------------|-------|-----|
| `ServiceTimeout` | Service didn't respond; or no unlocked sessions | Transient — auto-retried. Verify state if persists |
| `MessageLockLost` | Processing exceeded lock duration or link detached | Reduce processing time, ensure autolock renewal works |
| `SessionLockLost` | Session lock expired or link detached | Re-accept session, keep renewing lock |
| `QuotaExceeded` | Too many concurrent receives | Reduce receivers or use batch receives |
| `MessageSizeExceeded` | Message or batch > max size | Reduce payload. Premium supports individual messages up to 100MB. Batch limit is computed from max message size on the client, so batches can also be impacted |
| `UnauthorizedAccess` | Bad credentials | Verify connection string, SAS, or RBAC roles |

`ServiceBusError` fields: `code`, `retryable`, `name`, `info`, `address`.

## Enable Logging

```bash
# All SDK logs
export AZURE_LOG_LEVEL=verbose

# Or granular control
export DEBUG=azure*,rhea*

# Errors only
export DEBUG=azure:service-bus:error,azure:core-amqp:error,rhea-promise:error,rhea:events,rhea:frames,rhea:io,rhea:flow
```

Log to file:
```bash
node app.js > out.log 2>debug.log
```

## Key Issues

- **Socket exhaustion**: Treat `ServiceBusClient` as singleton. Each creates a new AMQP connection. Always call `close()`.
- **Lock lost before expiry**: Can happen on link detach (transient network issue or 10-min idle timeout). Not always due to processing time.
- **Batch receive returns fewer messages**: After first message arrives, receiver waits only 1s for additional messages. `maxWaitTimeInMs` controls wait for the *first* message only.
- **Autolock renewal not working**: Ensure system clock is accurate. Autolock relies on system time.
- **Batch size limits**: Batch limit is artificially computed on the client from the max message size sent by the service. Send large messages individually if batch creation fails.
- **WebSockets**: Pass `webSocketOptions` to `ServiceBusClient` constructor for port 443 connectivity.
- **Distributed tracing**: Experimental OpenTelemetry support via `@azure/opentelemetry-instrumentation-azure-sdk`.

```

### references/sdk/azure-servicebus-dotnet.md

```markdown
# Azure Service Bus SDK — .NET (C#)

Package: `Azure.Messaging.ServiceBus` | [README](https://github.com/Azure/azure-sdk-for-net/blob/main/sdk/servicebus/Azure.Messaging.ServiceBus/) | [Full Troubleshooting Guide](https://github.com/Azure/azure-sdk-for-net/blob/main/sdk/servicebus/Azure.Messaging.ServiceBus/TROUBLESHOOTING.md)

## Common Errors

| Exception | Reason | Fix |
|-----------|--------|-----|
| `ServiceBusException` (ServiceTimeout) | Service didn't respond | Transient — auto-retried. For session accept, means no unlocked sessions |
| `ServiceBusException` (MessageLockLost) | Lock expired or link detached | Renew lock, reduce processing time, check network |
| `ServiceBusException` (SessionLockLost) | Session lock expired | Re-accept session, renew lock before expiry |
| `ServiceBusException` (QuotaExceeded) | Too many concurrent receives | Reduce receivers or use batch receives |
| `ServiceBusException` (MessageSizeExceeded) | Message or batch too large | Reduce payload. Premium tier supports individual messages up to 100MB. Batch limit is artificially computed on the client from the max message size sent by the service, so batches can also be impacted |
| `ServiceBusException` (ServiceBusy) | Request throttled | Auto-retried with 10s backoff. See [throttling docs](https://learn.microsoft.com/azure/service-bus-messaging/service-bus-throttling) |
| `UnauthorizedAccessException` | Bad credentials | Verify connection string, SAS, or RBAC roles |

## Exception Filtering

```csharp
try { /* receive messages */ }
catch (ServiceBusException ex) when (ex.Reason == ServiceBusFailureReason.ServiceTimeout)
{
    // Handle timeout
}
```

## Key Issues

- **Socket exhaustion**: Treat `ServiceBusClient` as singleton. Each creates a new AMQP connection. Always call `CloseAsync`/`DisposeAsync`.
- **Lock lost before expiry**: Can happen on link detach (transient network) or 10-min idle timeout.
- **Processor high concurrency**: May cause hangs with extreme concurrency settings. Test with moderate values.
- **Session processor slow switching**: Tune `SessionIdleTimeout` to reduce wait time between sessions.
- **Batch size limits**: Batch limit is artificially computed on the client from the max message size sent by the service. Send large messages individually if batch creation fails.
- **Transactions across entities**: Requires all entities on same namespace. Use `ServiceBusClient.CreateSender` with `via` entity support.
- **WebSockets**: Use `ServiceBusTransportType.AmqpWebSockets` when AMQP ports (5761, 5762) are blocked.

```



---

## Skill Companion Files

> Additional files collected from the skill directory layout.

### references/auth-best-practices.md

```markdown
# Azure Authentication Best Practices

> Source: [Microsoft — Passwordless connections for Azure services](https://learn.microsoft.com/azure/developer/intro/passwordless-overview) and [Azure Identity client libraries](https://learn.microsoft.com/dotnet/azure/sdk/authentication/).

## Golden Rule

Use **managed identities** and **Azure RBAC** in production. Reserve `DefaultAzureCredential` for **local development only**.

## Authentication by Environment

| Environment | Recommended Credential | Why |
|---|---|---|
| **Production (Azure-hosted)** | `ManagedIdentityCredential` (system- or user-assigned) | No secrets to manage; auto-rotated by Azure |
| **Production (on-premises)** | `ClientCertificateCredential` or `WorkloadIdentityCredential` | Deterministic; no fallback chain overhead |
| **CI/CD pipelines** | `AzurePipelinesCredential` / `WorkloadIdentityCredential` | Scoped to pipeline identity |
| **Local development** | `DefaultAzureCredential` | Chains CLI, PowerShell, and VS Code credentials for convenience |

## Why Not `DefaultAzureCredential` in Production?

1. **Unpredictable fallback chain** — walks through multiple credential types, adding latency and making failures harder to diagnose.
2. **Broad surface area** — checks environment variables, CLI tokens, and other sources that should not exist in production.
3. **Non-deterministic** — which credential actually authenticates depends on the environment, making behavior inconsistent across deployments.
4. **Performance** — each failed credential attempt adds network round-trips before falling back to the next.

## Production Patterns

### .NET

```csharp
using Azure.Identity;

var credential = Environment.GetEnvironmentVariable("AZURE_FUNCTIONS_ENVIRONMENT") == "Development"
    ? new DefaultAzureCredential()                          // local dev — uses CLI/VS credentials
    : new ManagedIdentityCredential();                      // production — deterministic, no fallback chain
// For user-assigned identity: new ManagedIdentityCredential("<client-id>")
```

### TypeScript / JavaScript

```typescript
import { DefaultAzureCredential, ManagedIdentityCredential } from "@azure/identity";

const credential = process.env.NODE_ENV === "development"
  ? new DefaultAzureCredential()                          // local dev — uses CLI/VS credentials
  : new ManagedIdentityCredential();                      // production — deterministic, no fallback chain
// For user-assigned identity: new ManagedIdentityCredential("<client-id>")
```

### Python

```python
import os
from azure.identity import DefaultAzureCredential, ManagedIdentityCredential

credential = (
    DefaultAzureCredential()                              # local dev — uses CLI/VS credentials
    if os.getenv("AZURE_FUNCTIONS_ENVIRONMENT") == "Development"
    else ManagedIdentityCredential()                      # production — deterministic, no fallback chain
)
# For user-assigned identity: ManagedIdentityCredential(client_id="<client-id>")
```

### Java

```java
import com.azure.identity.DefaultAzureCredentialBuilder;
import com.azure.identity.ManagedIdentityCredentialBuilder;

var credential = "Development".equals(System.getenv("AZURE_FUNCTIONS_ENVIRONMENT"))
    ? new DefaultAzureCredentialBuilder().build()          // local dev — uses CLI/VS credentials
    : new ManagedIdentityCredentialBuilder().build();      // production — deterministic, no fallback chain
// For user-assigned identity: new ManagedIdentityCredentialBuilder().clientId("<client-id>").build()
```

## Local Development Setup

`DefaultAzureCredential` is ideal for local dev because it automatically picks up credentials from developer tools:

1. **Azure CLI** — `az login`
2. **Azure Developer CLI** — `azd auth login`
3. **Azure PowerShell** — `Connect-AzAccount`
4. **Visual Studio / VS Code** — sign in via Azure extension

```typescript
import { DefaultAzureCredential } from "@azure/identity";

// Local development only — uses CLI/PowerShell/VS Code credentials
const credential = new DefaultAzureCredential();
```

## Environment-Aware Pattern

Detect the runtime environment and select the appropriate credential. The key principle: use `DefaultAzureCredential` only when running locally, and a specific credential in production.

> **Tip:** Azure Functions sets `AZURE_FUNCTIONS_ENVIRONMENT` to `"Development"` when running locally. For App Service or containers, use any environment variable you control (e.g. `NODE_ENV`, `ASPNETCORE_ENVIRONMENT`).

```typescript
import { DefaultAzureCredential, ManagedIdentityCredential } from "@azure/identity";

function getCredential() {
  if (process.env.NODE_ENV === "development") {
    return new DefaultAzureCredential();          // picks up az login / VS Code creds
  }
  return process.env.AZURE_CLIENT_ID
    ? new ManagedIdentityCredential(process.env.AZURE_CLIENT_ID)  // user-assigned
    : new ManagedIdentityCredential();                            // system-assigned
}
```

## Security Checklist

- [ ] Use managed identity for all Azure-hosted apps
- [ ] Never hardcode credentials, connection strings, or keys
- [ ] Apply least-privilege RBAC roles at the narrowest scope
- [ ] Use `ManagedIdentityCredential` (not `DefaultAzureCredential`) in production
- [ ] Store any required secrets in Azure Key Vault
- [ ] Rotate secrets and certificates on a schedule
- [ ] Enable Microsoft Defender for Cloud on production resources

## Further Reading

- [Passwordless connections overview](https://learn.microsoft.com/azure/developer/intro/passwordless-overview)
- [Managed identities overview](https://learn.microsoft.com/entra/identity/managed-identities-azure-resources/overview)
- [Azure RBAC overview](https://learn.microsoft.com/azure/role-based-access-control/overview)
- [.NET authentication guide](https://learn.microsoft.com/dotnet/azure/sdk/authentication/)
- [Python identity library](https://learn.microsoft.com/python/api/overview/azure/identity-readme)
- [JavaScript identity library](https://learn.microsoft.com/javascript/api/overview/azure/identity-readme)
- [Java identity library](https://learn.microsoft.com/java/api/overview/azure/identity-readme)

```