07-24-2023 09:05 AM
hi
im trying to understand and figure out why our new ISE node is not authenticating devices.
the scenario is this...
we have an on Prem node which is v2.7, this node is to be decommissioned
deployed a new node in Azure which is v3.2
exported the config from v2.7 and imported into v3.2 - checking the config of the new node everything looks good, all certs are there, policy sets, profiles etc.. everything looks healthy.
my test lab is a meraki MX with an AP.
radius configuration points to the internal IP of the new node (meraki proxy not ticked)
firewall rules on the MX and our main firewall allow comms on port 1812,1813 - i can confirm im seeing traffic go thru successfully
when i look at the new nodes live logs i can see my laptops attempt to authenticate but fails with the following
12805 | Extracted TLS ClientHello message |
|
| 12806 | Prepared TLS ServerHello message |
| 12807 | Prepared TLS Certificate message |
| 12808 | Prepared TLS ServerKeyExchange message |
| 12809 | Prepared TLS CertificateRequest message |
| 12810 | Prepared TLS ServerDone message |
| 12505 | Prepared EAP-Request with another EAP-TLS challenge |
| 11006 | Returned RADIUS Access-Challenge |
| 11001 | Received RADIUS Access-Request |
| 11018 | RADIUS is re-using an existing session |
| 12504 | Extracted EAP-Response containing EAP-TLS challenge-response |
| 12505 | Prepared EAP-Request with another EAP-TLS challenge |
| 11006 | Returned RADIUS Access-Challenge |
| 11001 | Received RADIUS Access-Request |
| 11018 | RADIUS is re-using an existing session |
| 12504 | Extracted EAP-Response containing EAP-TLS challenge-response |
| 12505 | Prepared EAP-Request with another EAP-TLS challenge |
| 11006 | Returned RADIUS Access-Challenge |
| 11001 | Received RADIUS Access-Request |
| 11018 | RADIUS is re-using an existing session |
| 12504 | Extracted EAP-Response containing EAP-TLS challenge-response |
| 12505 | Prepared EAP-Request with another EAP-TLS challenge |
| 11006 | Returned RADIUS Access-Challenge |
| 5440 | Endpoint abandoned EAP session and started new (
Step latency=18680 ms) |
Event | 5440 Endpoint abandoned EAP session and started new |
Failure Reason | 5440 Endpoint abandoned EAP session and started new |
Resolution | Verify known NAD or supplicant issues and published bugs. Verify NAD and supplicant configuration. |
Root cause | Endpoint started new authentication while previous is still in progress. Most probable that supplicant on that endpoint stopped conducting the previous authentication and started the new one. Closing the previous authentication. |
11001 | Received RADIUS Access-Request |
11017 | RADIUS created a new session |
11117 | Generated a new session ID |
15049 | Evaluating Policy Group |
15008 | Evaluating Service Selection Policy |
15048 | Queried PIP - Normalised Radius.RadiusFlowType |
15048 | Queried PIP - Radius.Called-Station-ID |
11507 | Extracted EAP-Response/Identity |
12500 | Prepared EAP-Request proposing EAP-TLS with challenge |
11006 | Returned RADIUS Access-Challenge |
11001 | Received RADIUS Access-Request |
11018 | RADIUS is re-using an existing session |
12502 | Extracted EAP-Response containing EAP-TLS challenge-response and accepting EAP-TLS as negotiated |
12800 | Extracted first TLS record; TLS handshake started |
12545 | Client requested EAP-TLS session ticket |
12542 | The EAP-TLS session ticket received from supplicant while the stateless session resume is disabled. Performing full authentication |
theres also mention of certificate issues...
12935 Supplicant stopped responding to ISE during EAP-TLS certificate exchange
Supplicant stopped responding to ISE during EAP-TLS certificate exchange
Verify that supplicant is configured properly to conduct a full EAP conversation with ISE. Verify that NAS is configured properly to transfer EAP messages to/from supplicant. Verify that supplicant or NAS does not have a short timeout for EAP conversation. Check the network that connects the Network Access Server to ISE. Verify that ISE local server certificate is trusted on supplicant. Verify that supplicant has a properly configured user/machine certificate.
when i change the radius settings back to the old node, authentication work.. so this tells me that the certs are all good.
so im a bit stuck as to what is missing here....
any suggestions on what else i can check?
i have raised a case with TAC and waiting for a reply.
cheers
Solved! Go to Solution.
07-28-2023 06:44 AM
looks like i got it sorted... i changed the MTU on the azure ISE node to 1300 and authentication worked straight away.
i came across this github post
https://github.com/MicrosoftDocs/azure-docs/issues/69477#issuecomment-1318717067
and one of the steps there mentions changing the mtu on the ise node, so i thought might as well... node aint working properly as it is so no harm in doing that.
and voila... worked straight away.
as long as theres no issues with the mtu being at 1300 and authentications work then we will stick with this.
more testing will be done next week.
ive been banging my head for a week trying to resolve this... what a win for a friday!
07-24-2023 01:17 PM
Is there any way you can check the MTU size on the LAN segment on which the new ISE node is installed? I have seen cases in the past where the MTU is negotiated to be larger than what the ISE segment allows - since the size of the cert exchanges can exceed the MTU, the packets get fragmented. If I recall correctly, ISE must be on a segment that is 1500 bytes MTU.
I think you can test the MTU theory with the ping command, using various payload sizes (see where there limit is)
07-25-2023 01:24 AM
@Arne Bier wrote:Is there any way you can check the MTU size on the LAN segment on which the new ISE node is installed? I have seen cases in the past where the MTU is negotiated to be larger than what the ISE segment allows - since the size of the cert exchanges can exceed the MTU, the packets get fragmented. If I recall correctly, ISE must be on a segment that is 1500 bytes MTU.
I think you can test the MTU theory with the ping command, using various payload sizes (see where there limit is)
i will need to check to see if this is possible... also need to find the ping command for this
ill post back later...
cheers
07-25-2023 01:39 AM
@Arne Bier wrote:Is there any way you can check the MTU size on the LAN segment on which the new ISE node is installed? I have seen cases in the past where the MTU is negotiated to be larger than what the ISE segment allows - since the size of the cert exchanges can exceed the MTU, the packets get fragmented. If I recall correctly, ISE must be on a segment that is 1500 bytes MTU.
I think you can test the MTU theory with the ping command, using various payload sizes (see where there limit is)
tested the MTU theory and can ping the ISE node with a max byte size of 1398
anything over 1398 either times out or says packet needs to be fragmented but DF set.
07-24-2023 01:22 PM
check user and machine Cert. it assign from same CA?
07-25-2023 01:23 AM
@MHM Cisco World wrote:check user and machine Cert. it assign from same CA?
we use machine certs for this and yes i can confirm that they are from the same root CA.
07-25-2023 01:20 PM
I'd say the problem is that these oversized PDUs (certs in TCP payload) don't fit into the allowed MTU size - fragmentation must be allowed. I just tested in my lab and the ping from the user VLAN to the ISE VLAN fails when I send pings of 1501 bytes or larger. If you log into the ISE CLI (not sure if that is possible in Azure?) you see the MTU for the interface is 1500 bytes.
07-26-2023 01:19 AM
@Arne Bier wrote:I'd say the problem is that these oversized PDUs (certs in TCP payload) don't fit into the allowed MTU size - fragmentation must be allowed. I just tested in my lab and the ping from the user VLAN to the ISE VLAN fails when I send pings of 1501 bytes or larger. If you log into the ISE CLI (not sure if that is possible in Azure?) you see the MTU for the interface is 1500 bytes.
im not very familiar with MTU size and how it all works... so bear with me as i try and grasp whats going on.
so i tried the ping test to our onprem node (v2.7) which is hosted in our DC and the highest byte size i could send there before it timed out or said packet needs to be fragmented but DF set is 1404.
anything over 1404 did not reply.
when setting byte size to 1500 i get packet needs to be fragmented but DF set.
saying this, i changed my radius details in my test lab to point to the onprem node using the internal IP and can confirm that authentication worked and my laptop connected successfully.
not entirely sure what that tells you about the MTU here but there is a difference between the 2 nodes.
Old node largest byte size is 1404
new node largest byte size is 1398
and yes i can access ISE cli in azure.. will check what MTU is set as, ill post back.
07-26-2023 01:33 AM - edited 07-26-2023 01:36 AM
so can confirm that the gigabitEthernet 0 interface on both of our ISE nodes are set to mtu 1500
07-26-2023 01:56 AM - edited 07-26-2023 02:17 AM
i did a wireshark capture and can see info where it says
Fragmented IP protocol (proto=UDP 17, off=0, ID=179f) [Reassembled in #37]
Frame 36: 1428 bytes on wire (11424 bits), 1428 bytes captured (11424 bits)
Encapsulation type: Raw IP (7)
Arrival Time: Jul 26, 2023 09:50:40.015810000 GMT Summer Time
[Time shift for this packet: 0.000000000 seconds]
Epoch Time: 1690361440.015810000 seconds
[Time delta from previous captured frame: 0.147329000 seconds]
[Time delta from previous displayed frame: 0.326792000 seconds]
[Time since reference or first frame: 3.459401000 seconds]
Frame Number: 36
Frame Length: 1428 bytes (11424 bits)
Capture Length: 1428 bytes (11424 bits)
[Frame is marked: False]
[Frame is ignored: False]
[Protocols in frame: raw:ip:data]
Raw packet data
Internet Protocol Version 4, Src: 172.19.107.2, Dst: 10.81.12.9
0100 .... = Version: 4
.... 0101 = Header Length: 20 bytes (5)
Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
Total Length: 1428
Identification: 0x179f (6047)
001. .... = Flags: 0x1, More fragments
...0 0000 0000 0000 = Fragment Offset: 0
Time to Live: 63
Protocol: UDP (17)
Header Checksum: 0x114b [validation disabled]
[Header checksum status: Unverified]
Source Address: 172.19.107.2
Destination Address: 10.81.12.9
[Reassembled IPv4 in frame: 37]
Data (1408 bytes)
Data: d15b071407dd218401fb07d5288650e001a21a03acde4f9f66434a6a012b686f73742f37…
[Length: 1408]
Source being my lab AP
Destination being new ise node in Azure
so if im understanding correctly and as you guys have mentioned... the packet is being fragmented somewhere along the way...
07-26-2023 02:49 AM
going thru the wireshark captures i can see that the new node is not getting an accept reply from the node.
comparing captures between old and new nodes
so yeah something is definitely happening along the way....
capture to new ise node
auth failed
capture to old node
auth successful
07-26-2023 03:19 AM
A lot of interesting observations so far but you’re probably looking for a solution at this stage. From everything I have read, fragmentation can go wrong when there is a firewall inbetween that is dropping the fragments. Some load balancers may also need to be made aware that there is UDP fragmentation involved. What is the NAD? Is it a wireless access point ? And are you able to set the MTU on that device to match that of the ISE node? You can also set the MTU in ISE on the CLI but be careful. Test in the lab and not production.
07-26-2023 03:34 AM
@Arne Bier wrote:A lot of interesting observations so far but you’re probably looking for a solution at this stage. From everything I have read, fragmentation can go wrong when there is a firewall inbetween that is dropping the fragments. Some load balancers may also need to be made aware that there is UDP fragmentation involved. What is the NAD? Is it a wireless access point ? And are you able to set the MTU on that device to match that of the ISE node? You can also set the MTU in ISE on the CLI but be careful. Test in the lab and not production.
weve got a fortigate FW managing traffic to and from Azure so we are checking to see if theres anything on there thats causing the issue... we do have internal load balancers working with the fortigate so gonna check those as well.
the NAD is the WAP.
Unfortunately theres no setting on the AP which allows me to edit the MTU settings. I think id need to contact support to make any changes.
I feel like we are getting closer to whatever is dropping these packets... suspect its a networking/firewall issue
07-27-2023 02:54 AM
found this interesting post on MS
https://learn.microsoft.com/en-us/azure/virtual-network/virtual-network-tcpip-performance-tuning
The default MTU for Azure VMs is 1,500 bytes. The Azure Virtual Network stack will attempt to fragment a packet at 1,400 bytes.
Note that the Virtual Network stack isn't inherently inefficient because it fragments packets at 1,400 bytes even though VMs have an MTU of 1,500. A large percentage of network packets are much smaller than 1,400 or 1,500 bytes.
been in contact with meraki support who have confirmed that the auto vpn MTU is set to 1432
so based on the above its possible that the azure virtual network is causing our issue.
im asking meraki support if they can change the auto vpn mtu to a lower number... say 1300.
07-28-2023 06:44 AM
looks like i got it sorted... i changed the MTU on the azure ISE node to 1300 and authentication worked straight away.
i came across this github post
https://github.com/MicrosoftDocs/azure-docs/issues/69477#issuecomment-1318717067
and one of the steps there mentions changing the mtu on the ise node, so i thought might as well... node aint working properly as it is so no harm in doing that.
and voila... worked straight away.
as long as theres no issues with the mtu being at 1300 and authentications work then we will stick with this.
more testing will be done next week.
ive been banging my head for a week trying to resolve this... what a win for a friday!
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide