Solved: Re: ISE 802.1x auth error 5440 and 12935

TedB123 · ‎07-24-2023

hi

im trying to understand and figure out why our new ISE node is not authenticating devices.

the scenario is this...

we have an on Prem node which is v2.7, this node is to be decommissioned

deployed a new node in Azure which is v3.2

exported the config from v2.7 and imported into v3.2 - checking the config of the new node everything looks good, all certs are there, policy sets, profiles etc.. everything looks healthy.

my test lab is a meraki MX with an AP.

radius configuration points to the internal IP of the new node (meraki proxy not ticked)

firewall rules on the MX and our main firewall allow comms on port 1812,1813 - i can confirm im seeing traffic go thru successfully

when i look at the new nodes live logs i can see my laptops attempt to authenticate but fails with the following

12805	Extracted TLS ClientHello message
	12806	Prepared TLS ServerHello message
	12807	Prepared TLS Certificate message
	12808	Prepared TLS ServerKeyExchange message
	12809	Prepared TLS CertificateRequest message
	12810	Prepared TLS ServerDone message
	12505	Prepared EAP-Request with another EAP-TLS challenge
	11006	Returned RADIUS Access-Challenge
	11001	Received RADIUS Access-Request
	11018	RADIUS is re-using an existing session
	12504	Extracted EAP-Response containing EAP-TLS challenge-response
	12505	Prepared EAP-Request with another EAP-TLS challenge
	11006	Returned RADIUS Access-Challenge
	11001	Received RADIUS Access-Request
	11018	RADIUS is re-using an existing session
	12504	Extracted EAP-Response containing EAP-TLS challenge-response
	12505	Prepared EAP-Request with another EAP-TLS challenge
	11006	Returned RADIUS Access-Challenge
	11001	Received RADIUS Access-Request
	11018	RADIUS is re-using an existing session
	12504	Extracted EAP-Response containing EAP-TLS challenge-response
	12505	Prepared EAP-Request with another EAP-TLS challenge
	11006	Returned RADIUS Access-Challenge
	5440	Endpoint abandoned EAP session and started new ( Step latency=18680 ms)

Event	5440 Endpoint abandoned EAP session and started new
Failure Reason	5440 Endpoint abandoned EAP session and started new
Resolution	Verify known NAD or supplicant issues and published bugs. Verify NAD and supplicant configuration.
Root cause	Endpoint started new authentication while previous is still in progress. Most probable that supplicant on that endpoint stopped conducting the previous authentication and started the new one. Closing the previous authentication.

11001	Received RADIUS Access-Request
11017	RADIUS created a new session
11117	Generated a new session ID
15049	Evaluating Policy Group
15008	Evaluating Service Selection Policy
15048	Queried PIP - Normalised Radius.RadiusFlowType
15048	Queried PIP - Radius.Called-Station-ID
11507	Extracted EAP-Response/Identity
12500	Prepared EAP-Request proposing EAP-TLS with challenge
11006	Returned RADIUS Access-Challenge
11001	Received RADIUS Access-Request
11018	RADIUS is re-using an existing session
12502	Extracted EAP-Response containing EAP-TLS challenge-response and accepting EAP-TLS as negotiated
12800	Extracted first TLS record; TLS handshake started
12545	Client requested EAP-TLS session ticket
12542	The EAP-TLS session ticket received from supplicant while the stateless session resume is disabled. Performing full authentication

theres also mention of certificate issues...

12935 Supplicant stopped responding to ISE during EAP-TLS certificate exchange

Supplicant stopped responding to ISE during EAP-TLS certificate exchange

Verify that supplicant is configured properly to conduct a full EAP conversation with ISE. Verify that NAS is configured properly to transfer EAP messages to/from supplicant. Verify that supplicant or NAS does not have a short timeout for EAP conversation. Check the network that connects the Network Access Server to ISE. Verify that ISE local server certificate is trusted on supplicant. Verify that supplicant has a properly configured user/machine certificate.

when i change the radius settings back to the old node, authentication work.. so this tells me that the certs are all good.

so im a bit stuck as to what is missing here....

any suggestions on what else i can check?

i have raised a case with TAC and waiting for a reply.

cheers

TedB123 · ‎07-28-2023

looks like i got it sorted... i changed the MTU on the azure ISE node to 1300 and authentication worked straight away.

i came across this github post
https://github.com/MicrosoftDocs/azure-docs/issues/69477#issuecomment-1318717067

and one of the steps there mentions changing the mtu on the ise node, so i thought might as well... node aint working properly as it is so no harm in doing that.

and voila... worked straight away.

as long as theres no issues with the mtu being at 1300 and authentications work then we will stick with this.
more testing will be done next week.

ive been banging my head for a week trying to resolve this... what a win for a friday!

View solution in original post

Arne Bier · ‎07-24-2023

Is there any way you can check the MTU size on the LAN segment on which the new ISE node is installed? I have seen cases in the past where the MTU is negotiated to be larger than what the ISE segment allows - since the size of the cert exchanges can exceed the MTU, the packets get fragmented. If I recall correctly, ISE must be on a segment that is 1500 bytes MTU.

I think you can test the MTU theory with the ping command, using various payload sizes (see where there limit is)

TedB123 · ‎07-25-2023

@Arne Bier wrote:
Is there any way you can check the MTU size on the LAN segment on which the new ISE node is installed? I have seen cases in the past where the MTU is negotiated to be larger than what the ISE segment allows - since the size of the cert exchanges can exceed the MTU, the packets get fragmented. If I recall correctly, ISE must be on a segment that is 1500 bytes MTU.
I think you can test the MTU theory with the ping command, using various payload sizes (see where there limit is)

i will need to check to see if this is possible... also need to find the ping command for this

ill post back later...

cheers

TedB123 · ‎07-25-2023

@Arne Bier wrote:
Is there any way you can check the MTU size on the LAN segment on which the new ISE node is installed? I have seen cases in the past where the MTU is negotiated to be larger than what the ISE segment allows - since the size of the cert exchanges can exceed the MTU, the packets get fragmented. If I recall correctly, ISE must be on a segment that is 1500 bytes MTU.
I think you can test the MTU theory with the ping command, using various payload sizes (see where there limit is)

tested the MTU theory and can ping the ISE node with a max byte size of 1398

anything over 1398 either times out or says packet needs to be fragmented but DF set.

MHM Cisco World · ‎07-24-2023

check user and machine Cert. it assign from same CA?

TedB123 · ‎07-25-2023

@MHM Cisco World wrote:
check user and machine Cert. it assign from same CA?

we use machine certs for this and yes i can confirm that they are from the same root CA.

Arne Bier · ‎07-25-2023

I'd say the problem is that these oversized PDUs (certs in TCP payload) don't fit into the allowed MTU size - fragmentation must be allowed. I just tested in my lab and the ping from the user VLAN to the ISE VLAN fails when I send pings of 1501 bytes or larger. If you log into the ISE CLI (not sure if that is possible in Azure?) you see the MTU for the interface is 1500 bytes.

TedB123 · ‎07-26-2023

@Arne Bier wrote:
I'd say the problem is that these oversized PDUs (certs in TCP payload) don't fit into the allowed MTU size - fragmentation must be allowed. I just tested in my lab and the ping from the user VLAN to the ISE VLAN fails when I send pings of 1501 bytes or larger. If you log into the ISE CLI (not sure if that is possible in Azure?) you see the MTU for the interface is 1500 bytes.

im not very familiar with MTU size and how it all works... so bear with me as i try and grasp whats going on.

so i tried the ping test to our onprem node (v2.7) which is hosted in our DC and the highest byte size i could send there before it timed out or said packet needs to be fragmented but DF set is 1404.
anything over 1404 did not reply.
when setting byte size to 1500 i get packet needs to be fragmented but DF set.

saying this, i changed my radius details in my test lab to point to the onprem node using the internal IP and can confirm that authentication worked and my laptop connected successfully.
not entirely sure what that tells you about the MTU here but there is a difference between the 2 nodes.

Old node largest byte size is 1404
new node largest byte size is 1398

and yes i can access ISE cli in azure.. will check what MTU is set as, ill post back.

TedB123 · ‎07-26-2023

so can confirm that the gigabitEthernet 0 interface on both of our ISE nodes are set to mtu 1500

TedB123 · ‎07-26-2023

i did a wireshark capture and can see info where it says

Fragmented IP protocol (proto=UDP 17, off=0, ID=179f) [Reassembled in #37]

Frame 36: 1428 bytes on wire (11424 bits), 1428 bytes captured (11424 bits)
Encapsulation type: Raw IP (7)
Arrival Time: Jul 26, 2023 09:50:40.015810000 GMT Summer Time
[Time shift for this packet: 0.000000000 seconds]
Epoch Time: 1690361440.015810000 seconds
[Time delta from previous captured frame: 0.147329000 seconds]
[Time delta from previous displayed frame: 0.326792000 seconds]
[Time since reference or first frame: 3.459401000 seconds]
Frame Number: 36
Frame Length: 1428 bytes (11424 bits)
Capture Length: 1428 bytes (11424 bits)
[Frame is marked: False]
[Frame is ignored: False]
[Protocols in frame: raw:ip:data]
Raw packet data
Internet Protocol Version 4, Src: 172.19.107.2, Dst: 10.81.12.9
0100 .... = Version: 4
.... 0101 = Header Length: 20 bytes (5)
Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
Total Length: 1428
Identification: 0x179f (6047)
001. .... = Flags: 0x1, More fragments
...0 0000 0000 0000 = Fragment Offset: 0
Time to Live: 63
Protocol: UDP (17)
Header Checksum: 0x114b [validation disabled]
[Header checksum status: Unverified]
Source Address: 172.19.107.2
Destination Address: 10.81.12.9
[Reassembled IPv4 in frame: 37]
Data (1408 bytes)
Data: d15b071407dd218401fb07d5288650e001a21a03acde4f9f66434a6a012b686f73742f37…
[Length: 1408]

Source being my lab AP
Destination being new ise node in Azure
so if im understanding correctly and as you guys have mentioned... the packet is being fragmented somewhere along the way...

TedB123 · ‎07-26-2023

going thru the wireshark captures i can see that the new node is not getting an accept reply from the node.
comparing captures between old and new nodes

so yeah something is definitely happening along the way....

capture to new ise node
auth failed

capture to old node
auth successful

Arne Bier · ‎07-26-2023

A lot of interesting observations so far but you’re probably looking for a solution at this stage. From everything I have read, fragmentation can go wrong when there is a firewall inbetween that is dropping the fragments. Some load balancers may also need to be made aware that there is UDP fragmentation involved. What is the NAD? Is it a wireless access point ? And are you able to set the MTU on that device to match that of the ISE node? You can also set the MTU in ISE on the CLI but be careful. Test in the lab and not production.

TedB123 · ‎07-26-2023

@Arne Bier wrote:
A lot of interesting observations so far but you’re probably looking for a solution at this stage. From everything I have read, fragmentation can go wrong when there is a firewall inbetween that is dropping the fragments. Some load balancers may also need to be made aware that there is UDP fragmentation involved. What is the NAD? Is it a wireless access point ? And are you able to set the MTU on that device to match that of the ISE node? You can also set the MTU in ISE on the CLI but be careful. Test in the lab and not production.

weve got a fortigate FW managing traffic to and from Azure so we are checking to see if theres anything on there thats causing the issue... we do have internal load balancers working with the fortigate so gonna check those as well.

the NAD is the WAP.
Unfortunately theres no setting on the AP which allows me to edit the MTU settings. I think id need to contact support to make any changes.

I feel like we are getting closer to whatever is dropping these packets... suspect its a networking/firewall issue

TedB123 · ‎07-27-2023

found this interesting post on MS

https://learn.microsoft.com/en-us/azure/virtual-network/virtual-network-tcpip-performance-tuning

Azure and VM MTU

The default MTU for Azure VMs is 1,500 bytes. The Azure Virtual Network stack will attempt to fragment a packet at 1,400 bytes.

Note that the Virtual Network stack isn't inherently inefficient because it fragments packets at 1,400 bytes even though VMs have an MTU of 1,500. A large percentage of network packets are much smaller than 1,400 or 1,500 bytes.

been in contact with meraki support who have confirmed that the auto vpn MTU is set to 1432
so based on the above its possible that the azure virtual network is causing our issue.

im asking meraki support if they can change the auto vpn mtu to a lower number... say 1300.

TedB123 · ‎07-28-2023

looks like i got it sorted... i changed the MTU on the azure ISE node to 1300 and authentication worked straight away.

i came across this github post
https://github.com/MicrosoftDocs/azure-docs/issues/69477#issuecomment-1318717067

and one of the steps there mentions changing the mtu on the ise node, so i thought might as well... node aint working properly as it is so no harm in doing that.

and voila... worked straight away.

as long as theres no issues with the mtu being at 1300 and authentications work then we will stick with this.
more testing will be done next week.

ive been banging my head for a week trying to resolve this... what a win for a friday!