cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
3440
Views
80
Helpful
15
Replies

NAC not working on 1 Desk, works everywhere else.

Hi all,

 

I have a strange one for you.

 

We have deployed a two-node ISE 2.7 Patch 6 cluster. We have deployed NAC successfully about 7 months ago. We have a few issues now and again but all is working well.

 

Recently there are a number of desks where NAC has stopped working regardless of who uses that desk. NAC works for the users on other desks without any problems. When the go back to the "Desk X", they don't get any connectivity.

 

On the ISE logs, I can see the user when user logged in at Desk 1 @ 09:25 (see screen shot "NAC1.jpg") successfully and got network access. When the user went to "Desk X", I can see entries in ISE at 09:52, 09:59, 10:05, these three entries shows a successful authentication with a session, however on the users laptop it states that there is no internet. The three entries is where we tried to eliminate the docking station, network cables and the switchport as a root cause.

 

Troubleshooting steps at Desk X:

  • User logged in at this desk at 09:52 using the docking station, no internet. ISE reports successful authentication with a session. User is patched into switchport Gi3/0/4.
  • User logged in at this desk at 09:59 using no docking station, no internet. ISE reports successful authentication with a session. User is patched into switchport Gi3/0/4.
  • Moved patch cable from switchport Gi3/0/4 on STACK3 to Gi3/0/9 (same config) on same switch. The switch thinks that there is nothing connected

GigabitEthernet3/0/4 is up, line protocol is down (notconnect)

  Hardware is Gigabit Ethernet, address is 00bc.6094.2404 (bia 00bc.6094.2404)

  Description: ### User Access Port ###

  • Move cable from Gi3/0/4 to Gi3/0/9, same issue

GigabitEthernet3/0/9 is up, line protocol is down (notconnect)

  Hardware is Gigabit Ethernet, address is 00bc.6094.2409 (bia 00bc.6094.2409)

  Description: ### User Access Port ###

 

  • Removed NAC config, port came up and User logged in successfully, no issues with switchport or cabling.
  • Added NAC config, port came up and User logged in and successfully authenticated and logged into the network.

Switch config:

!
template PORT-AUTH-TEMPLATE
dot1x pae authenticator
mab
access-session host-mode multi-domain
access-session control-direction in
access-session closed
access-session port-control auto
authentication periodic
authentication timer reauthenticate server
service-policy type control subscriber INT-AUTH-POLICY
!
!
interface GigabitEthernet3/0/9
description ### User Access Port ###
switchport access vlan XX
switchport mode access
switchport voice vlan YY
srr-queue bandwidth share 10 10 60 20
srr-queue bandwidth shape 10 0 0 0
queue-set 2
priority-queue out
mls qos trust device cisco-phone
mls qos trust cos
dot1x timeout tx-period 60
dot1x max-reauth-req 3
auto qos voip cisco-phone
storm-control broadcast level 30.00 25.00
storm-control action shutdown
storm-control action trap
source template PORT-AUTH-TEMPLATE
spanning-tree portfast edge
spanning-tree bpduguard enable
service-policy input AutoQoS-Police-CiscoPhone
end

Regardless of what user uses that desk, it will not work. A fix for this is to remove NAC and re add it.

 

The switch is a 2960X-48FPD-L and is a member of a stack which has 3 switches in it. The software version is: 15.2(4)E6. 

 

This is occurring randomly across desks across numerous sites for the customer.

 

Any ideas?

 

 

 

 

 
 

 

 

1 Accepted Solution

Accepted Solutions

Hi @Marcelo Morais 

 

This is excellent.

 

The next steps:

  • Upgrade the switch stack to 15.2(7)E5 and monitor for two weeks with periodic timer of 2880
  • Check the FW MTU setting for all user subnets, 
  • In the mean-time the temporary fix for current issue is to remove NAC from the port and add it back in again
    • Remove NAC
no dot1x timeout tx-period 60

no dot1x max-reauth-req 3

no source template PORT-AUTH-TEMPLATE

shut

no shut
    • Add NAC
dot1x timeout tx-period 60

dot1x max-reauth-req 3

source template PORT-AUTH-TEMPLATE

shut

no shut

Just to point out that if a switchport starts behaving like this, it doesn't matter who logs into the network on these ports, they will not get network access. This includes printers and phones (MAB).

 

Hopefully an IOS upgrade will resolve this.

 

Thanks

Anthony.

View solution in original post

15 Replies 15

HI @Anthony O'Reilly ,

 1st ... about the aaa accounting update newinfo periodic 5 command, consider the use of 1440 or 2880.

 2nd ... during the issue what is the result of the following command:

#show mac address-table interface G3/0/4
#show ip device tracking interface G3/0/4
#show authentication sessions interface G3/0/4

Regards

I think device-tracking may be missing from the config. As Marcelo mentioned, let's see the output of those commands.

Hi @Marcelo Morais

 

#show mac address-table interface G3/0/4

There was nothing showing in the mac address-table for this interface. If I moved the device to another interface, it was in the mac address-table.

#show ip device tracking interface G3/0/4

I didn't run this at the time, I will the next time.

 

#show authentication sessions interface G3/0/4

 There was nothing in the output for this command, it was as if there was nothing patched into the switch port.

 

If I plugged a phone into the port, the phone would get power. There was no entry for the phone in the #sh cdp nei command.

 

I ran a shut and no shut on the ports that were having issues. I will wait for the next interface that has this issues and report back. I will test this command aaa accounting update newinfo periodic 1440 on three 2960 switches.

 

@Arne Bier Device tracking is on this, it is configured globally using this command ip device tracking probe delay 10

This is the result from the tracking on the switch.

 

STACK01#sh ip device tracking all
Global IP Device Tracking for clients = Enabled
Global IP Device Tracking Probe Count = 3
Global IP Device Tracking Probe Interval = 30
Global IP Device Tracking Probe Delay Interval = 10
-----------------------------------------------------------------------------------------------
IP Address MAC Address Vlan Interface Probe-Timeout State Source
-----------------------------------------------------------------------------------------------
1.1.1.2 xxxx.xxxx.xxx 10 GigabitEthernet2/0/3 30 ACTIVE ARP
1.1.1.4 xxxx.xxxx.xxx 10 GigabitEthernet2/0/33 30 ACTIVE ARP
1.1.2.2 xxxx.xxxx.xxx 24 GigabitEthernet2/0/14 30 ACTIVE ARP
1.1.1.8 xxxx.xxxx.xxx 10 GigabitEthernet1/0/1 30 ACTIVE ARP

Hi @Anthony O'Reilly ,

1st: the "periodic" of the aaa accouting update newinfo periodic [1440 or 2880] ensure that the RADIUS Accounting (Interim Accounting Update) is sent to the ISE Node (regardless if the SW observes a change for the Active Session or not) every 1 or 2 days, If ISE fails to receive Interim Accounting Message for an Endpoint Session beyond 5 days, ISE will stop maintaining the Session for that Endpoint. A "periodic = 5" (5 sec) generates a lot of Interim Accounting Update.

2nd: the "device tracking" is enabled via the (config)# ip device tracking (although it is enable by default in 15.x+), the (config)# ip device tracking probe delay 10 does not allow a SW to send a Probe for 10sec when it detects a Link UP/Flap (a good practice).

3rd: when you said " ... There was nothing showing in the mac address-table for this interface ... ", in other words, if you check the #show logging info, then something like this appears? I mean the "Unknown MAC":

May 27 XX:XX:XX.XXX: %AUTHMGR-5-START: Starting 'dot1x' for client ("Unknown Mac") on Interface G3/0/4 AuditSessionID XXXXXXXXXXXXXXXXXXXXXXXX

 

Regards

Hi @Marcelo Morais 

 

Thanks for your quick response.

 

1. The switches in one site are now configured as aaa accouting update newinfo periodic 2880 I am currently monitoring them.

 

2. The ip device tracking command has not changed

 

3. There is definitely nothing in the logs for Unknown for our testing on Gi3/0/4. I've searched logs for Unknown and unknown across all our switches looking for other examples and I don't have any entries for it.

 

Is there anything else I can check?

 

Thanks

Anthony.

Hi @Anthony O'Reilly ,

 in other words:

1. the MAC Addr Table is empty for G3/0/4

2. the Starting 'dot1x' for client on G3/0/4 is not appearing on the show logging

3. you are having problems with DeskX, even if you change the SW Port (G3/0/4 to G3/0/9).

My question:

At NAC1.JPG (your 1st image), ISE received an EAP-TLS packet from DeskX, but there is no Starting 'dot1x' for client on G3/0/4 on show logging and no MAC on the MAC Addr Table on G3/0/4 ... did you check if the Endpoint sent the packet to the SW (debug dot1x all) and the SW sent the the packet to ISE (debug radius)?

 

Regards

Hi @Marcelo Morais 

 

1. the MAC Addr Table is empty for G3/0/4 This is correct

2. the Starting 'dot1x' for client on G3/0/4 is not appearing on the show logging I am not sure on this at the time the log buffer was full and I cleared it to save me strolling down. There were four ports in total on this switch, I checked the logs for Gi3/0/4, Gi3/0/9, Gi2/0/6 and Gi2/0/16

3. you are having problems with DeskX, even if you change the SW Port (G3/0/4 to G3/0/9). This desk has two ports that had the same symptoms.

My question:

At NAC1.JPG (your 1st image), ISE received an EAP-TLS packet from DeskX, but there is no Starting 'dot1x' for client on G3/0/4 on show logging and no MAC on the MAC Addr Table on G3/0/4 ... did you check if the Endpoint sent the packet to the SW (debug dot1x all) and the SW sent the the packet to ISE (debug radius)? Unfortunately, I didn't do any debugs, I will for the next one. There are alot of people due on site on Monday and I will be ready to collect logs and debugs. There was no Starting 'dot1x' for client in the logs.

 

I will hopefully have a live example for you on Monday, fingers crossed.

 

Thanks for all your help, much appreciated.

Hi @Marcelo Morais 

 

I have a good example this morning. I was able to replicate the issue whether the laptop was on a docking station or plugged directly into the laptop.

 

Laptop is in port Gi1/0/18, there two commands were run after the device was in the port after about 50 seconds.

 

SW2#sh access-session int gi1/0/18
No sessions match supplied criteria.

Runnable methods list:
Handle Priority Name
6 5 dot1x
17 10 mab
15 15 webauth

 

SW#sh mac address-table int gi1/0/18
Mac Address Table
-------------------------------------------

Vlan Mac Address Type Ports
---- ----------- -------- -----
!
interface GigabitEthernet1/0/18
description ### User Access Port ###
switchport access vlan XX
switchport mode access
switchport voice vlan YY
srr-queue bandwidth share 10 10 60 20
srr-queue bandwidth shape 10 0 0 0
queue-set 2
priority-queue out
mls qos trust device cisco-phone
mls qos trust cos
dot1x timeout tx-period 60
dot1x max-reauth-req 3
auto qos voip cisco-phone
storm-control broadcast level 30.00 25.00
storm-control action shutdown
storm-control action trap
source template PORT-AUTH-TEMPLATE
spanning-tree portfast edge
spanning-tree bpduguard enable
service-policy input AutoQoS-Police-CiscoPhone
end

 Debug dot1x all and debug radius are attached.

 

I can do another test again if you wish.

 

Thanks

Anthony.

 

thomas
Cisco Employee
Cisco Employee

This is a good one for TAC since it sounds like you are under a time pressure to get it fixed.

It sounds like it could be a switch bug... consider upgrading the switch.

 

Hi @Marcelo Morais @thomas 

 

I think we may have hit this bug.

 

All of the issues so have have been on member switches of a switch stack

 

https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvv93417

 

What do you think?

Thanks

Anthony.

Hi @Anthony O'Reilly ,

 interesting and CSCvv93417 2960x stack Member Switch fails wired dot1x; MasterSwitch passes dot1x using the same configs
 could be the cause, are you able to test your Endpoint in a SW without a stack configuration?

As you said: " ... The Switch is a 2960X-48FPD-L and is a Member of a stack which has 3 Switches in it. The software version is: 15.2(4)E6... "

  I checked the EAP and RADIUS debug:

ISE IP       "10.10.10.100"
NAS IP "10.10.1.1"
Framed-MTU "1500"
Endpoint IP "10.10.10.10"
EndPoint MAC "48-2A-E3-3E-72-04"

I'm able to verify the Access-Request and Access-Challenge from/to NAD to/from ISE.

Attention to:

May 30 10:40:03.385: RADIUS/ENCODE(00000000):Orig. component type = Invalid

please take a look at CSCuu75107 2960 - traceback with EAP Authentication timeout., although there is no RESULT_OVERRIDE in your logs.
Attention to:

May 30 10:40:02.839: RADIUS/DECODE: EAP-Message fragments, 253+253+253+249, total 1008 bytes

always good to double check Fragmentation/MTU end-to-end (specially if you have a FW between) ... just in case : )

 

Regards

Hi @Marcelo Morais 

 

This is excellent.

 

The next steps:

  • Upgrade the switch stack to 15.2(7)E5 and monitor for two weeks with periodic timer of 2880
  • Check the FW MTU setting for all user subnets, 
  • In the mean-time the temporary fix for current issue is to remove NAC from the port and add it back in again
    • Remove NAC
no dot1x timeout tx-period 60

no dot1x max-reauth-req 3

no source template PORT-AUTH-TEMPLATE

shut

no shut
    • Add NAC
dot1x timeout tx-period 60

dot1x max-reauth-req 3

source template PORT-AUTH-TEMPLATE

shut

no shut

Just to point out that if a switchport starts behaving like this, it doesn't matter who logs into the network on these ports, they will not get network access. This includes printers and phones (MAB).

 

Hopefully an IOS upgrade will resolve this.

 

Thanks

Anthony.

Hi @Anthony O'Reilly ,

 excellent Action Plan !!!!

 

Regards

Hi @Marcelo Morais 

Just to let you know that the IOS upgrade from 15.2(7)E3 to 15.2(7)E5 on the Cisco 2960X switches work. We haven't encountered this issue since.

Thanks for all your help. Much appreciated.