cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
5697
Views
10
Helpful
10
Replies

How to clear Fault 1394 (port down, used by Fabric)

Hi community,

I try to clear Fault 1394 (Port is down, reason:sfpAbsent(connected), used by:Fabric) by temporarily setting the Fault severity to squelched. This fault message is a result of moving the Leaf Uplink cables to a different port. I go to System > Faults > select Fault F1394 > select one particular fault occurence and using right click, try to change the severity of the fault to "squelched". ACI fabric does not allow me to clear that fault saying "Failed to apply the changes".

I have suspected, that authorization might be the cause, but my user permissions should allow admin access to the whole fabric, they are now set as follows:

Domains:
Name: All; Read Privilege = admin; Write Privilege = admin
Accessible Objects:
DN: uni/tn-mgmt; Read Privilege = admin,none; Write Privilege = admin
DN: uni/tn-common; Read Privilege = admin,none; Write Privilege = admin
DN: uni/tn-infra; Read Privilege = admin,none; Write Privilege = admin


I did not find any logs, authorization denied messages, nothing.

Any Idea, why I can't change the Fault severity to squelched or clear the Fault F1394 in a different way?

Regards,
Alexander

10 Replies 10

Jayesh Singh
Cisco Employee
Cisco Employee

Hi Alexander,

 

Disabling the port should clear the fault. Go to Fabric--Inventory--POD#--NodeID--Physical Interfaces--select the interface and right click - disable.

 

User access looks ok to me, not sure about permission issue, can u share apic version.

 

Let me know if that helps!

Regards,

Jayesh

Hi Jayesh,

 

Disabling the port indeed clears the f1394 Fault, but as soon as I reenable the port, the fault is back again. For some reason, ACI remembers, that once-upon-a-time, the Leaf port used to be connected to the Spine.

 

I thought, that setting the fault to "squelched" would wipe the ACI memories of the port being a part of ACI Fabric overlay, but the GUI does not allow me to do that.

 

Yes, and this is all happening on the 3.2(3o) release.

 

Regards,

Alexander

 

 

Alexander - If you want to ignore the fault, trying configuring the "ignore acknowledged fault" flag. To see how to do so, go check out https://unofficialaciguide.com/2017/10/03/ignoring-acknowledged-faults/ @ UnofficialACIGuide.

 

Jody

Hi Jody,

 

Thanks for the hint, I do have the "Ignore Acknowledged Faults" flag ticked and hence the Fault does not lower the health score once it is acknowledged.

 

Yet, the fault is still showing in the System > Fault list and this is confusing for the ACI customer's support team. So that they want the fault being removed completelly, which I understand. That fault had been generated a long time ago and has nothing to do with the current ACI Fabric state.

 

Thanks for the reply anyway,

Alexander

Thats right Alexander, acknowledging the fault just keeps APIC from giving weightage to that fault type, however fault is still seen.

 

Squelching is like suppressing alarms for specific fault code, which is the needed here. Its interesting why the APIC is not allowing you from configuring that.

Yes, that's weird, that ACI does not allow me to configure this. I have even tried to turn off the Tacacs AAA and tried to log in with the Admin = Superpower user, but the behaviour was the same.

 

The other question is, why is ACI showing the same Fault again and again. I was hoping, that squelching the fault and clearing this squelching would clear the historical state of the port, but as you see, I was not allowed to do that.

 

Regards,

Alexander

Hi Alexander,

 

I tried to squelch a fault in my lab on on 3.2(3O) and got the same error "Failed to apply the changes".

One thing i observed is, if you go to Fabric--> Access Policies --> Policies --> Monitoring --> expand default --> Fault Severity Assignment Policies, here it shows BLANK.

 

Normally it should show Monitoring Object field there (verified on 2.3(1f) and 3.0(2k). Like image shown below:Monitoring_Policy.PNG

 

Seems like a bug. It would be great if someone from Cisco could acknowledge the situation here and let us know if we are missing something.

 

Meanwhile, Alex if that's your production setup, you can engage TAC for quick resolution. Let me know how it goes.

 

Regards,

Jayesh

Note: Snapshot used in this post is taken from cisco sandbox apic.

Finally, I have found a solution how to clear the F1394 fault.

- Go to System>Faults and find the F1394 fault in the table of faults and double click on the F1394 category.

- Choose one of the fault occurences, right-click on the fault and choose "Ignore Fault". The system will warn you that you won't be able to detect those kinds of faults anymore. Just choose to Ignore the fault.

- Reenable the Fault detection (since normally, you will want to know, that a fabric port went down, right?) by going to Fabric > Fabric Policies > Policies > Monitoring > default > Fault Severity Assignment Policies. On the right, you should see the fault F1394 listed. Right-click on the fault category and choose to delete the policy. Now, your system will detect Fabric port failures again.

 

This procedure might work on other types of historical failures as well instead of changing the fault severity to "squelched". In fact I think, that the procedure is doing the same thing and just don't know, why one way this works and the other it does not.

 

Good luck with ACI,

Alexander


@Alexander Pickar wrote:

 

 

- Reenable the Fault detection (since normally, you will want to know, that a fabric port went down, right?) by going to Fabric > Fabric Policies > Policies > Monitoring > default > Fault Severity Assignment Policies. On the right, you should see the fault F1394 listed. Right-click on the fault category and choose to delete the policy. Now, your system will detect Fabric port failures again.


In Fault Severity Assignmemt Policies tab, on the right side it should also show monitoring object, which is missing in 3.2(3O) (atleast in 2 pods where I checked), probably due to which squelching is not working.

 

Seems like what you discovered is a way to make the APIC forget that those ports were once connected to fabric and re-enabling the fault detection would generate any new faults... Please confirm if my understanding is correct.

 

Regards, 

Jayesh

Yes, Jayesh,

 

I have tested this behaviour in our lab twice.

- Using this procedure, the Fault is cleared from the Fault log.

- Also, if a new Fault of the same type occures, it is logged once again (of course only in case you've deleted the relevant Fault Severity Assignmemt Policy).

 

Still, I wonder, why Cisco does not allow to clear the Fault occurence from the log stright away and you need to use such a complicated way.

 

Regards,

Alexander

Save 25% on Day-2 Operations Add-On License