With the current climate and radical change to working methods COVID-19 has brought worldwide, more of us are either working from home due to requirements or now enjoying the flexibility this new style of working has brought to the modern workplace.
This increased demand brings a requirement for technology to enable this and Microsoft’s Always On VPN is being deployed and adopted by a large number of businesses due to its ease of deployment and functionality that it brings.
So you have Always On deployed, clients are connecting, connectivity is relatively stable and working well for you; connections come in across a perimeter firewall because you’ve designed it right. But sometimes, the connection for the client – which was working perfectly fine – suddenly gives an 809 error (cannot contact VPN Server).
No changes have occurred on the client, no changes on the server, no changes on the firewall, Richard Hicks articles consulted and confirmed all is as it should be. But, removing the client’s session from the firewall for the user’s public IP remediates the issue and the client is able to connect again. Not ideal, but a workaround at least.
AOVPN Mobility Behaviour
This mobility functionality allows the VPN server to keep the connection “up” for a period of time after the client has been rudely disconnected. The Keepalives should timeout after a period, but it appears from observation in the wild that it does not, and this has been observed in multiple environments. This by itself does not cause issues for the VPN service itself, as when the client comes back online it either reconnects the session or creates a new one. However, this can cause unexpected results when dealing with other devices on the network, like a perimeter firewall.
Often, these devices are configured to disconnect “dead” links after a period of inactivity, such as the Paulo Alto firewall which in this case was set to 30 seconds. Because of these unwanted keepalives, the firewall device in this case never closed the connection because of Keepalive traffic across the wire and when the client tries to reconnect it refuses the connection and you get your 809 error. The only workaround to this particular case was to clear the firewall session and the client was able to connect again without any configuration or remediation on the server or client.
A Potential Solution
Looking further into this issue for a customer to avoid having to employ the workaround, there is little or no documentation regarding the ability to restrict the IKEv2 Keepalives frequency aside from the Mobility functionality by reducing the Idle or Network Outage times.
Working on a test system we were able to replicate these unwanted disconnects by initiating Airplane mode on the test device and we saw Keepalives being generated past the Network Outage thresholds. Reducing these from the default 30 minutes to a more aggressive threshold of 5 might not be right for every environment but from initial testing normal VPN tunnel connectivity was not affected. However, the Keepalive issue had not been resolved and so the firewall issue persisted.
Looking at an older Microsoft Reference article for L2TP and PPTP protocol settings here, it outlines settings to add to limit the frequency and timing of the Keepalives. As IKEv2 also uses HelloMS and ACK for its Keepalive mechanism as laid out in here, the following theoretically should work for the IKEv2 service.
This is experimental and is currently undergoing testing so please take these suggestions as unsupported/undocumented. PowerON takes no responsibility for issues or outages that may occur following implementation of the below.
Locate the appropriate registry key hive for IKEv2 and make the following changes to add the DWORDs as required:
After implementing the changes and rebooting the server, the Airplane mode test was repeated and the number of keepalives, after the VPN connection in the RRAS console had been dropped badly, a further ten Keepalives packets were observed from the server as desired.
So we now potentially have a mechanism to regulate this known behavior from the AOVPN server when using the IKEv2 protocol for VPN tunnels.
Results of this change will be provided to Microsoft who are alleged to be looking into potential future patches for this and other issues for the AOVPN service.
Heard about AOVPN DPC?
We’ve been deploying Always On VPN solutions for many years, and over time we have received a lot of feedback about the manageability of the solution.