CRITICAL

Troubleshooting RDP Error 0x4: 'Out of Memory' During High-Traffic Scaling Events

Quick Fix Summary

TL;DR

Restart the Remote Desktop Services service to clear session memory.

Error 0x4 indicates the RDP server cannot allocate sufficient non-paged pool memory for new sessions, often due to a memory leak in the TermDD.sys driver or system-wide pool exhaustion during scaling.

Diagnosis & Causes

  • TermDD.sys driver memory leak in Windows Server 2012 R2/2016.
  • System-wide non-paged pool exhaustion from other drivers or processes.
  • Insufficient RAM or misconfigured virtual memory (pagefile) for concurrent session load.
  • Recovery Steps

    1

    Step 1: Verify Memory State and Identify Leak Source

    Confirm non-paged pool exhaustion and check for the known TermDD.sys leak. Use PowerShell to query memory counters and pool usage by driver.

    powershell
    # Check overall and non-paged pool memory
    Get-Counter '\Memory\Available Bytes', '\Memory\Pool Nonpaged Bytes'
    
    # List drivers using non-paged pool (requires admin)
    Get-WmiObject Win32_PerfRawData_PerfOS_Memory | Select-Object PoolNonpagedBytes
    
    # Check for specific TermDD.sys leak (2012R2/2016) - High Pool Nonpaged Bytes with many sessions
    2

    Step 2: Apply Critical OS and Driver Patches

    Install the latest cumulative update for your Windows Server version, which often contains fixes for TermDD.sys and other memory leaks. Ensure all hardware drivers (especially network) are updated.

    powershell
    # Check current OS build
    [System.Environment]::OSVersion.Version
    
    # Manually download and install latest Cumulative Update from Microsoft Catalog.
    # Update all drivers via Device Manager or vendor tools.
    3

    Step 3: Increase System Commit Limit and Optimize Memory

    Ensure the pagefile is system-managed or sized appropriately (1.5x RAM minimum). Limit concurrent RDP sessions if necessary. Adjust RDP memory allocation via Group Policy.

    powershell
    # View current pagefile settings
    wmic pagefile list /format:list
    
    # Set pagefile to system-managed via PowerShell (Admin)
    $ComputerSystem = Get-WmiObject -Class Win32_ComputerSystem -EnableAllPrivileges
    $ComputerSystem.AutomaticManagedPagefile = $True
    $ComputerSystem.Put()
    
    # Limit RDP sessions via GPO: Computer Config -> Policies -> Admin Templates -> Windows Components -> Remote Desktop Services -> Remote Desktop Session Host -> Connections -> 'Limit Number of Connections'
    4

    Step 4: Implement Proactive Monitoring and Restart Schedule

    For persistent leaks, configure alerts on Pool Nonpaged Bytes and schedule regular, staggered restarts of the Remote Desktop Services service during maintenance windows.

    powershell
    # Create a scheduled task to restart service (example)
    $Action = New-ScheduledTaskAction -Execute 'powershell.exe' -Argument 'Restart-Service TermService -Force'
    $Trigger = New-ScheduledTaskTrigger -Weekly -DaysOfWeek Sunday -At 3AM
    Register-ScheduledTask -TaskName 'Weekly_RDS_Restart' -Action $Action -Trigger $Trigger -User 'SYSTEM'
    
    # Monitor with PerfMon: Log \Memory\Pool Nonpaged Bytes and \Terminal Services\Active Sessions
    5

    Step 5: Scale Horizontally and Isolate Workloads

    For cloud/VM environments, implement a session host farm behind a connection broker. Use scaling policies based on memory pressure, not just CPU. Isolate high-memory applications to dedicated hosts.

    bash
    # Azure CLI example to scale out VMSS (concept)
    az vmss scale --name <sessionHostScaleSet> --resource-group <rg> --new-capacity <higherNumber>
    
    # AWS CLI example to increase ASG desired capacity
    aws autoscaling set-desired-capacity --auto-scaling-group-name <asg-name> --desired-capacity <number> --honor-cooldown

    Architect's Pro Tip

    "The 'Out of Memory' error is often a *non-paged pool* exhaustion, not general RAM. A sudden spike in connections can trigger a latent driver leak (like TermDD.sys). Always correlate the error timestamp with PerfMon logs for 'Pool Nonpaged Bytes' and 'Pool Nonpaged Allocs' to confirm."

    Frequently Asked Questions

    We patched the server, but the error returned weeks later. Why?

    Another driver or kernel component may have a separate, slower memory leak. The patch fixed one leak source, but high traffic over time exhausts the pool from another. Use Step 1 to identify the new top consumer.

    Can increasing RAM solve this?

    Not directly. Non-paged pool is a separate, limited kernel memory region. While more RAM can slightly increase the pool limit, it does not fix leaks. The primary solutions are leak patches, pool monitoring, and horizontal scaling.

    Is a server reboot a valid long-term fix?

    No. A reboot clears the pool but is a reactive, disruptive solution. It is a temporary workaround while you implement the diagnostic and patching steps outlined above to address the root cause.

    Related Windows Guides