cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
689
Views
0
Helpful
8
Replies

Recovering a Disconnected Leaf Errors on ACI

O.K.
Level 1
Level 1

Hello Everyone, 

We have had some errors on the APIC GUI. Basically errors were interface errors, but the leave has been removed from the fabric and errors are persisted, and it was not possible to remove them on the GUI. 

After a quick search, I've found this and here is documented (down below "Recovering a Disconnected Leaf Using the REST API")

how you can delete them via REST API. 

I've sent the following .json request via Ansible and it worked. 

 

 

- name: Send desired req. to node 2509 - json
      ansible.builtin.aci_rest:
        hostname: "{{ inventory_hostname }}"
        username: "{{ username }}"
        password: "{{ password }}"
        validate_certs: false
        path: "/api/policymgr/mo/.json"
        method: post
        content: {"fabricRsOosPath":{"attributes": {"dn":"uni/fabric/outofsvc/rsoosPath-[topology/pod-5/paths-2509/pathep-[eth1/{{int_id}}]]","status":"deleted"}}}
      loop: "{{ range(1,53) | list }}"
      loop_control:
        loop_var: int_id

 

 

 

So my question here is, we have the error types F1299 and F1209, which are from different classes but the leaves doesn't exist anymore. When I try to delete them with the same method (with the correct class of course), nothing happens. Does anyone have any idea?

For example an F1299 looks like:

 

uni/fabric/nodecfgcont/node-2509/rsnodePolGroup-[uni/fabric/funcprof/lenodepgrp-ALL]/source-[uni/fabric/leprof-ALL/leaves-ALL-typ-range]
Fault delegate: Switch profile configuration has not been deployed on node 2509 because: Node Not Leaf For Fabric Policies
Config
configuration-failed

 

and I have used the following play:

 

- name: Send desired req.
      ansible.builtin.aci_rest:
        hostname: "{{ inventory_hostname }}"
        username: "{{ username }}"
        password: "{{ password }}"
        validate_certs: false
        path: "/api/mo/.json"
        method: post
        content: {"fabricCreatedBy":{"attributes": {"dn":"uni/fabric/nodecfgcont/node-2509/rsnodePolGroup-[uni/fabric/funcprof/lenodepgrp-ALL]/source-[uni/fabric/leprof-ALL/leaves-ALL-typ-range]","status":"deleted"}}}

 

Thanks!

Regards.

8 Replies 8

AshSe
VIP
VIP

So my question here is, we have the error types F1299 and F1209, which are from different classes but the leaves doesn't exist anymore. When I try to delete them with the same method (with the correct class of course), nothing happens. Does anyone have any idea?



Sorry your question is not clear. Can you just brief your question again?

PFB, a detailed explanation of both the fault types:

Error Code F1299:

Description: This error code generally indicates a fault related to the fabric node. It could be due to a variety of reasons such as connectivity issues, configuration errors, or hardware problems.

Common Causes:

  1. Connectivity issues between APIC and fabric nodes.
  2. Misconfiguration in the fabric settings.
  3. Hardware failures or issues in the fabric nodes.

Troubleshooting Steps:

  1. Check Connectivity: Ensure that the APIC can communicate with all fabric nodes. Verify the physical and logical connections.
  2. Review Configuration: Check the configuration settings on the APIC and fabric nodes to ensure they are correct and consistent.
  3. Inspect Hardware: Look for any hardware issues or alerts on the fabric nodes. Replace or repair any faulty hardware components.
  4. Logs and Documentation: Review the APIC logs and Cisco documentation for more detailed information on the specific fault and recommended actions.
 

Error Code F1209:

Description: This error code typically indicates a fault related to the APIC itself. It could be due to software issues, configuration errors, or resource constraints.

Common Causes:

  1. Software bugs or issues in the APIC firmware.
  2. Misconfiguration in the APIC settings.
  3. Resource constraints such as CPU, memory, or storage limitations.

Troubleshooting Steps:

  1. Update Firmware: Ensure that the APIC is running the latest firmware version. Apply any available patches or updates.
  2. Review Configuration: Check the APIC configuration settings for any errors or inconsistencies.
  3. Monitor Resources: Monitor the APIC's resource usage (CPU, memory, storage) to identify any constraints or bottlenecks.
  4. Logs and Documentation: Review the APIC logs and Cisco documentation for more detailed information on the specific fault and recommended actions.

O.K.
Level 1
Level 1

Hello @AshSe ,
Thank you for your reply!

Basically, my question is how to get rid of these error messages? I can't delete them from the APIC GUI or via API, because the nodes are deleted (decommissioned) from the fabric and doesn't exist anymore.  

AshSe
VIP
VIP

Hello @O.K.  

To address the F1299 and F1209 error messages in Cisco APIC when the nodes have already been decommissioned and no longer exist in the fabric, you can follow these steps:

  1. Clear Faults via CLI: Sometimes, clearing the faults directly from the APIC CLI can help. You can use the following commands to clear the faults:

Screenshot 2024-10-23 at 2.31.21 PM.png

This will list the faults. To clear them, you can use:

Screenshot 2024-10-23 at 2.32.17 PM.png

2. Use the REST API to Clear Faults: If you prefer using the API, you can send a DELETE request to the fault instance. Here’s an example using curl:

Screenshot 2024-10-23 at 2.34.23 PM.pngReplace <APIC_IP> with the IP address of your APIC and admin:password with your APIC credentials.

3. Check for Residual Configuration: Ensure there are no residual configurations or references to the decommissioned nodes. Sometimes, stale configurations can cause persistent faults. You can check for any remaining references using the following:

 Screenshot 2024-10-23 at 2.36.17 PM.png

If you find any references to the decommissioned nodes, you can delete them using:

Screenshot 2024-10-23 at 2.37.08 PM.png

4. APIC Reboot: As a last resort, if the faults persist and you have confirmed that there are no residual configurations, you might consider rebooting the APIC controllers. This can sometimes clear out stale faults.

Screenshot 2024-10-23 at 2.38.17 PM.png

Note: Rebooting the APIC controllers should be done during a maintenance window as it will temporarily disrupt the management plane.

Can you post the code of the curl command? The screenshot gets cut off. 

Thanks!

It sounds like you're looking for the complete curl command. If you can share the context of what you're trying to achieve, I might be able to help. In the meantime, if you're editing video content with KineMaster, remember to keep your workflow efficient—consider using screen recordings to capture commands or outputs as you work through code. This way, you can refer back to them easily while editing! source website

Hello @AshSe 
Thank you for your post. 

Can you please paste the commands instead of the pictures? 
The pictures are half, and it is not possible to see the whole command. 

Thanks in advance!

Hello @O.K. Here are the steps and commands to clear the specific fault codes (F1299 and F1209) via the APIC CLI:

  1. Access the APIC CLI:

    1. SSH into your APIC controller.
    ssh admin@<APIC_IP>
     
  2. Navigate to the Faults:

    • Use the following commands to navigate to the fault instances and delete them.
  3. Clear Fault F1299:

    • First, find the fault instance:
    moquery -c faultInst | grep F1299
     
    • This will give you the distinguished name (DN) of the fault instance. Once you have the DN, you can delete it using the following command:
    moquery -c faultInst -f 'fault.Inst.severity=="F1299"'
     
    • If you find the exact DN, you can delete it:
    mo del <DN>
     

    For example, if the DN is uni/fault-F1299, you would run:

    mo del uni/fault-F1299
     
  4. Clear Fault F1209:

    • Similarly, find the fault instance:
    moquery -c faultInst | grep F1209
     
    • This will give you the DN of the fault instance. Once you have the DN, you can delete it using the following command:
    moquery -c faultInst -f 'fault.Inst.severity=="F1209"'
     
    • If you find the exact DN, you can delete it:
    mo del <DN>
     

    For example, if the DN is uni/fault-F1209, you would run:

    mo del uni/fault-F1209
     
  5. Verify the Faults are Cleared:

    • After deleting the faults, you can verify that they are cleared by running:
    moquery -c faultInst
     
    • Ensure that the fault codes F1299 and F1209 no longer appear in the output.

By following these steps, you should be able to clear the specific fault codes from your Cisco ACI fabric using the APIC CLI.

 

Below are the REST API commands to clear the specific fault codes (F1299 and F1209) using the Cisco ACI REST API:

  1. Log in to the APIC and get a token:

    First, you need to authenticate and obtain a session token. Replace <APIC_IP>, <username>, and <password> with your APIC's IP address and your login credentials.

    curl -k -X POST https://<APIC_IP>/api/aaaLogin.json -d '{ "aaaUser": { "attributes": { "name": "<username>", "pwd": "<password>" } } }' -c cookies.txt
     
  2. Find the Fault Instances:

    To find the specific fault instances, you can query the fault instances and filter by the fault code.

    curl -k -X GET https://<APIC_IP>/api/node/class/faultInst.json -b cookies.txt | jq '.imdata[] | select(.faultInst.attributes.code == "F1299")'
     
    curl -k -X GET https://<APIC_IP>/api/node/class/faultInst.json -b cookies.txt | jq '.imdata[] | select(.faultInst.attributes.code == "F1209")'
     

    This will give you the distinguished names (DNs) of the fault instances.

  3. Clear Fault F1299:

    Once you have the DN for the fault instance, you can delete it. Replace <DN> with the actual DN of the fault instance.

    curl -k -X POST https://<APIC_IP>/api/node/mo/<DN>.json -b cookies.txt -d '{ "faultInst": { "attributes": { "status": "deleted" } } }'
     

    For example, if the DN is uni/fault-F1299, you would run:

    curl -k -X POST https://<APIC_IP>/api/node/mo/uni/fault-F1299.json -b cookies.txt -d '{ "faultInst": { "attributes": { "status": "deleted" } } }'
     
  4. Clear Fault F1209:

    Similarly, once you have the DN for the fault instance, you can delete it. Replace <DN> with the actual DN of the fault instance.

    curl -k -X POST https://<APIC_IP>/api/node/mo/<DN>.json -b cookies.txt -d '{ "faultInst": { "attributes": { "status": "deleted" } } }'
     

    For example, if the DN is uni/fault-F1209, you would run:

    curl -k -X POST https://<APIC_IP>/api/node/mo/uni/fault-F1209.json -b cookies.txt -d '{ "faultInst": { "attributes": { "status": "deleted" } } }'
     
  5. Verify the Faults are Cleared:

    After deleting the faults, you can verify that they are cleared by querying the fault instances again:

    curl -k -X GET https://<APIC_IP>/api/node/class/faultInst.json -b cookies.txt
     

    Ensure that the fault codes F1299 and F1209 no longer appear in the output.

 

Hope This Helps!!!

 

AshSe

Forum Tips: 

1. Insert photos/images inline - don't attach.

2. Always mark helpful and correct answers, it helps others find what they need.

Hello @AshSe 
Thank you again for your post. 

Are you sure these commands works? Have you ever tried them in an environment? 

I'm asking because I tested them in two different environment with different fault codes (1209,1299 and 2873), and it seems they don't work. 

As an example:
If I follow the steps:

tst-apic# curl -k -X GET https://tst-apic/api/node/class/faultInst.json -b cookies.txt | jq '.imdata[] | select(.faultInst.attributes.code == "F1299")'

gives:

...
{
  "faultInst": {
    "attributes": {
      "ack": "no",
      "alert": "no",
      "cause": "configuration-failed",
      "changeSet": "deplSt (Old: delivered, New: node-not-ready)",
      "childAction": "",
      "code": "F1299",
      "created": "2024-08-27T15:35:38.933+01:00",
      "delegated": "yes",
      "descr": "A profile configuration has not been deployed on node non-retrievable(852:887), because: Node Not Ready",
      "dn": "uni/infra/nodecfgcont/node-113/rstoInterfacePolProfile-[uni/infra/accportprof-__ui_pps_n113]/source-[uni/infra/nprof-__ui_pps_n113]/fault-F1299",
      "domain": "infra",
      "highestSeverity": "minor",
      "lastTransition": "2024-08-27T15:37:41.832+01:00",
      "lc": "raised",
      "occur": "1",
      "origSeverity": "minor",
      "prevSeverity": "minor",
      "rule": "fabric-created-by-configuration-failed",
      "severity": "minor",
      "status": "",
      "subject": "fabric-node",
      "title": "",
      "type": "config"
    }
  }
}
...

and if I try with the "dn":

curl -k -X POST https://tst-apic/api/node/mo/uni/infra/nodecfgcont/node-113/rstoInterfacePolProfileOpt-[uni/infra/accportprof-__ui_pps_n113]/source-[uni/infra/nprof-__ui_pps_n113]/fault-F1299.json -b cookies.txt -d '{ "faultInst": { "attributes": { "status": "deleted" } } }'

following error occurs:

zsh: no matches found: https://tst-apic/api/node/mo/uni/infra/nodecfgcont/node-113/rstoInterfacePolProfileOpt-[uni/infra/accportprof-__ui_pps_n113]/source-[uni/infra/nprof-__ui_pps_n113]/fault-F1299.json

 beside that the command 

mo del

doesn't exist as far as I can understand!

 

Review Cisco Networking for a $25 gift card

Save 25% on Day-2 Operations Add-On License