Meraki Monitoring for WAN Failover

Connection Monitoring for WAN Failover

Connection Monitor Overview

When the primary uplink goes down on an MX Security Appliance, events will appear under Network-wide > Monitor > Event log indicating a change in the primary uplink status. In the example below "uplink: 0" indicates that Internet 1 is being used, while "uplink: 1" indicates that Internet 2 is being used
db158677-951a-4903-8008-edc8b9f1198e

In Dashboard the preferred primary uplink can be configured, but that only matters when both are functioning. The MX will use the non-preferred uplink as the primary if it is the only one available. The MX monitors all uplinks and when it decides an uplink has no connectivity, will discontinue use of that link.
Note: If the MX is using the non-preferred uplink as the primary and the preferred uplink comes back online, the MX will wait about 15 seconds before switching the primary uplink to the preferred one. This is to prevent the primary connection from flapping in the event of intermittent failure or an unreliable link. 

Connectivity Tests

The MX runs tests to determine uplink status:
DNS test
  • Query the DNS servers (primary or secondary) configured on the Internet interface for the following hosts:
    • meraki.com
    • google.com
    • yahoo.com
Internet test
 ARP test 
  • ARP for the default gateway and its own IP (to detect a conflict).  
Connection monitoring runs on the uplink once it is activated, meaning a carrier is detected and an IP address is assigned (static or dynamic).

The first test DNS query is sent, if a DNS response is received, DNS is marked as good for 300 seconds on that uplink. During this time, the MX continues running the DNS test every 150 seconds. Each successful DNS query test results in DNS being marked as good for another 300 seconds.

If a test DNS query times out at any point, the MX decreases the testing interval to 30 seconds. If the DNS test continues to fail for a time period exceeding 300 seconds which is last time the test was successful, DNS will be marked as failed on the uplink. Otherwise, a successful test will again mark the DNS as good for another 300 seconds. Once marked as good, the test is run every 150 seconds.

The MX begins performing the round robin Internet test, if each of the tests are successful, the Internet is marked as good for 300 seconds on that uplink. During this time, the MX continues running the Internet test every 150 seconds. Each successful Internet test results in the Internet as being marked good for another 300 seconds. If any test within the Internet test group fails, the MX decreases the testing interval to 20 seconds. If the tests continue to fail for a time period exceeding 300 seconds which is last time the test was successful, the Internet will be marked as failed on the uplink. Otherwise, a successful test will again mark the Internet as good for another 300 seconds. Once marked as good, the test is run every 150 seconds.

When both tests have been unsuccessful for a period of time that exceeds 300 seconds, the uplink will be failed over. Therefore it will take approximately 5 minutes for failover to occur in the event of a soft failure (link is still up, but provides no upstream access).

SD-WAN Monitoring

For more information on SD-WAN monitoring , please refer to this article.

Comments

Popular posts from this blog

Configure Telnet/SSH Access to Device with VRF's

BGP VPNv4 Troubleshooting Commands .

Fortiguard tshoot