cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
3630
Views
13
Helpful
18
Replies

PFC Question

visitor68
Level 4
Level 4

Hi--

How does PFC decide which traffic class to pause when it's time to send out a PAUSE frame back to the sender? I havent read a single white paper or blogpost anywhere that explains that. All that is explained is that "during congestion" (which is seldom ever defined to boot, but I know now what's meant by that thanks to a Cisco white paper on PFC), a PFC-enabled switch port will send a PAUSE frame back to the sender based on the CoS value of the traffic.

What is never explained is how PFC selects the traffic to pause. Is it that the highest CoS values are paused first? Is it the lowest CoS value thats paused first? Is it the traffic class that is utilizing the most BW during a defined period of congestion that's paused first?

I hope someone knows.

Thanks

18 Replies 18

habadr
Cisco Employee
Cisco Employee

It is configured as part of QoS System Classes

http://www.cisco.com/en/US/partner/docs/switches/datacenter/nexus5000/sw/qos/502_n1_1/Cisco_Nexus_5000_Series_NX-OS_Quality_of_Service_Configuration_Guide_Rel_502_N1_1_chapter3.html#con_4217356593802126541

By default for 5010 and 5020 FCOE system class consider and traffic marked with COS 3 as no drop class and PFC will be applied. IN 5548 you have to define it

Thanks

Hatim Badr

Badr:

I tried going to the link, but I do not have permission.

Forbidden File or Application

The file or application you are  trying to access may require additional entitlement or you are trying to  access a file with an invalid name. Additional entitlement levels are  granted based on a users relationship with Cisco on a per-application  basis.

Let me see if I understand you correctly, though. You are saying that on a Nexus5 5010/20, FCoE traffic is automatically classified as CoS 3. Correct?

I'm not sure that answers my question, though. So, how does the fact that FCoE traffic is labeled as CoS 3 play a role when PFC sends a PAUSE frame? Are you saying that PAUSE frames are only sent for FCoE traffic and that FCoE traffic is labeled as CoS 3, so only CoS 3 traffic gets throttled in times of congestion?

Thanks

Hi

I think you need to loign to cisco.com to get access to this link (attached)

Priority Flow Control

Priority flow control (PFC) allows             you to apply pause functionality to specific classes of traffic on a link             instead of all the traffic on the link. PFC applies pause functionality based             on the IEEE 802.1p CoS value. When the switch enables PFC, it communicates to             the adapter which CoS values to apply the pause.

Ethernet interfaces use PFC to provide lossless             service to no-drop system classes. PFC implements pause frames on a per-class             basis and uses the IEEE 802.1p CoS value to identify the classes that require             lossless service.

In the switch, each system class has an associated             IEEE 802.1p CoS value that is assigned by default or configured on the system class.             If  you enable PFC, the switch sends the no-drop CoS values to the adapter,             which then applies PFC to these CoS values.

The default CoS value for the FCoE system class is             3. This value is configurable.

By default, the switch negotiates to enable the PFC             capability. If the negotiation succeeds, PFC is enabled and link-level flow             control remains disabled regardless of its configuration settings. If the PFC             negotiation fails, you can either force PFC to be enabled on the interface or             you can enable IEEE 802.x link-level flow control.

If you do not enable PFC on an interface, you can             enable IEEE 802.3X link-level pause. By default, link-level pause is disabled.


Default System Classes

The              Cisco Nexus 5000 Series switch provides the following system classes:


  • Drop system class

    By default, the software classifies all unicast and multicast Ethernet                     traffic into the default drop system class. This class is                     identified by qos-group 0.

    This class is created automatically when the                     system starts up (the class is named            class-default in           the CLI). You cannot delete this class and you cannot change the match criteria           associated with the default class.

  • FCoE system class              (For the Cisco Nexus 5010 switch and the Cisco Nexus 5020 switch)

    All Fibre Channel and FCoE control and data                 traffic is automatically classified into the FCoE system class, which provides                 no-drop service.

    This class is created automatically when the                 system starts up (the class is named            class-fcoe in the           CLI).

    You cannot delete class-fcoe and you can only modify the IEEE 802.1p CoS           value to associate with this class. This class is identified by qos-group 1.

    The switch classifies packets into the FCoE system             class as follows:


    • FCoE traffic is classified based on EtherType.

    • Native Fibre Channel traffic is classified                     based on the physical interface type.


      Note


      The optional N5K-M1404 or N5K-M1008 expansion modules provide native ½/4-Gigabit Fibre Channel ports.


  • FCoE system class (For the Cisco Nexus 5548 switch)

    For the Cisco Nexus 5548 switch, the class-fcoe is not automatically  created. Before you enable FCoE on the Cisco Nexus 5548 switch running  Cisco NX-OS Release 5.0(2)N1(1), you must enable class-fcoe in the three  types of qos policies:


    • type qos policy maps

    • type network-qos policy map (attached to system qos)

    • type queuing policy map (class-fcoe must be configured with a non-zero bandwidth percentage for input queuing policy maps.

      When class-fcoe is not included in the qos policies, vFC interfaces do not come up and increased drops occur.


    Note


    The Cisco Nexus 5548 switch supports five user-defined classes and one default drop system class

Please let me know you need more inforamtion

Thanks

Hatim Badr

Hati, once again, thank you.

Can you kindly answer this direct question...directly?

So, how does the fact that FCoE traffic is  labeled as CoS 3 play a role when PFC sends a PAUSE frame? Are you  saying that PAUSE frames are only sent for FCoE traffic and that FCoE traffic is labeled as CoS 3, so only CoS 3 traffic gets throttled in times of congestion?

Thanks

Hi ex-engineer

Sorry for not answering your question directly last time

Pause Frames are sent for traffic matching the class configured as lossless "NO drop" which is in Nexus 5010 and 5020 is the FCoE system class and configured as COS 3, which the default for FCOE traffic, however this value (COS) is configurable.

By the way Pause is not sent during link congestion, in fact it is sent when N5K switch reaches high buffer threshold

I found this article useful and explains it well

“The Cisco Nexus 5000 provides lossless ethernet services for the FCoE traffic received from the CNA. If the Nexus 5000 buffers reach a high threshold an 802.3x pause signal with the CoS equal to FCoE will be sent to the CNA. This per CoS pause signal tells the CNA to pause the FCoE traffic only, not the other TCP/IP traffic that is tolerant to loss. The default CoS setting for FCoE is COS 3. When the Nexus 5000 buffers reach a low threshold, a similar un-pause signal is sent to the CNA. The 802.3x per CoS pause provides the same functionality as FC buffer credits, controlling throughput based on the networks ability to carry the traffic reliably.”

http://bradhedlund.com/2009/01/01/nexus-1000v-with-fcoe-cna-and-vmware-esx-40-deployment-diagram/

Thanks

Hatim

Hatim, now you're talking, buddy!

OK, so this raises an interesting question. But just a quick review...

FCoE frames are PAUSED to prevent them from being dropped. The reason is that FC is intolerant to loss and its ULPs do not have any recovery mechanism built in because they assume a lossless fabric is being used as a transport mechanism (buffer-to-buffer credits).

So, the questions:

  • Are the other types of traffic also PAUSED? I believe that when PFC is enabled, 802.3X PAUSE is disabled. So, now the other traffic classes are not being throttled back. And the reason is that they do have recovery mechanisms in place, such as TCP retransmissions (Not sure what would be done for UDP,  by the way. Seems like we would have to depend on the application itself). Anyway, since TCP/UDP traffic is not being PAUSED, what stops them from hogging the receive buffers on the receiving device?

  • Remember, as we just said, the PAUSE frame is sent to the sender and therefore FCoE traffic is PAUSED when receive buffers reach a critical level. But what stops the other traffic from using those buffers and maintaining a constant state of criticality for FCoE?

By the way, I had a friend from Lebanon named Hatim. Ta7kee 3rabee?

Regards

Hi Ex-Engineer

Here where Nexus 5K hardware architecture and VOQ (Virtual ourput queue) come into play

"The Cisco Nexus 5000 Series implements virtual output queues (VOQs) on all ingress interfaces, so that a congested egress port does not affect traffic directed to other egress ports. But virtual output queuing does not stop there: every IEEE 802.1p class of service (CoS) uses a separate VOQ in the Cisco Nexus 5000 Series architecture, resulting in a total of 8 VOQs per egress on each ingress interface, or a total of 416 VOQs on each ingress interface. The extensive use of VOQs in the system helps ensure maximum throughput on a per-egress, per-CoS basis. Congestion on one egress port in one CoS does not affect traffic destined for other CoSs or other egress interfaces, thus avoiding head-of-line (HOL) blocking, which would otherwise cause congestion to spread"

http://www.cisco.com/en/US/prod/collateral/switches/ps9441/ps9670/white_paper_c11-462176.html

So COS3 (FCOE) traffic will have dedicated output Queue which should not impacted by other type of traffic.

Also please note that by default the FCoE traffic (traffic that maps to the FCoE system class) is assigned to an egress queue. This queue uses weighted round-robin (WRR) scheduling with 50 percent of the bandwidth.

Ana Ba7kee 3rabee

Thanks

Hatim Badr

Hatim, thank you.

Im still a bit confused, and I think the reason is that I need to see a diagram of the architecture. Im trying to figure out what is specific to an interface and what is specific to a CoS value....

Are there any good diagrams?

A Nexus 5K has 480KB of input buffer capacity. Are those buffers for the enitre interface or is it by CoS?

I'm sorry for all the questions - its not easy to grasp without a visual. The documents you sent me dont show the architecture of the VOQs and input buffers...

Anyway, ana ba7ki 3rabee kaman. Min Falastin.

Shukrun

Hi Ex-engineer

I agree with you a diagram will make it simpler but I do not have one but let me try to explain it to you

The Buffer is for the entire interface but the most important part that the ingress interface with VOQ knows whether the output queue is filled or not, if filled it will try to buffer it in ingress buffer if it can not buffer tehn it will send PAuse frame to the sender.

Thanks

Hatim Badr

Hatim or anyone else,

OK, let me see if I got it, ya akhee. :-)

The  interface on the Nexus 5010 and 5020 has one set of buffers that are  available to all Ethernet traffic. That means FCoE and IP.

Now,  to make a determination of when the interface needs to send a PAUSE  frame, it needs to take into consideration the availability of buffers  on the egress interfaces and this is what the VOQ functionality  provides. Through VOQ, the ingress interface's forwarding logic can  query the arbiter to find out if an egress queue/buffer is available. If  so, it will forward the packet to the egress interface, thereby freeing  up some of its ingress buffers. Great!

Lets get a bit  more specific. Lets say the ingress interface is 1/3 and there are  frames that need to go to 1/4 and 1/8 in 1/3's buffers. Now, lets say that egress buffers for 1/4 are not available at the time that the arbiter is queried. That  means that the ingress interface's forwarding logic will have to store  the frame in its buffers while egress interface 1/4's buffers open up.  To prevent HOL blocking, a VOQ for each and every interface on the  switch exists on the ingress interface. So, even if interface 1/4 is not  available to accept that frame from 1/3, that does NOT mean the other  frame destined for 1/8 will have to wait for the packet destined for 1/4  to be forwarded; it can be forwarded immediately to its egress queue.

Is ALL this correct, Haj??

Lets  say all this is correct. I still am not convinced that this is enough  to control the flow of IP traffic such that it does not use all the  ingress buffers and leave FCoE traffic in a PAUSed state for an extended  period of time. I think, CORRECT ME if I am wrong, that PFC does allow  for a PAUSE frame to PAUSE IP traffic, too. The IP traffic has to be  shaped/policed somehow. You can have an IP application that is going ape  sh-t and pulverizing the ingress interface such that even VOQ and QoS  arent 100% successful in managing the IP traffic. The buffers can  theoretically always remain in a critical threshold state.

I think that PFC does allow IP traffic to be PAUSed as well as FCoE.

Thoughts?

By  the way, I have had this discussion with several very astute and  seasoned engineers from Cisco and Brocade and there is a lot of  confusion with regard to this point.

Hi Ex-Engineer,

By defaulut only FCOE-Class is configured as no-Drop, you can configure addtional classes  as lossless (no-drop) as well however to achieve lossless Ethernet for both directions, the devices connected to the Cisco Nexus 5000 switch must have the similar capability.

There is dedicated ingress buffer for FCOE class and also if you decided to configure other class as no-drop (lossless) dedicated ingress buffer will be allocated as well.

Beginning with Cisco NX-OS Release 5.0(2)N1(1), you can configure the no-drop buffer threshold settings.

http://www.cisco.com/en/US/partner/docs/switches/datacenter/nexus5000/sw/qos/503_n1_1/cisco_nexus_5000_qos_config_gd_503_chapter3.html#task_81ABFBE86A57475DA65966D5C9BC24A1

To see the queue buffer you can issue show queuing interface(the following is output from Nexus 5010 switch please see the red lines)

Rack6-2-5010P# show queuing interface e1/1

Interface Ethernet1/1 TX Queuing
qos-group  sched-type  oper-bandwidth
    0       WRR             50
    1       WRR             50

Interface Ethernet1/1 RX Queuing
qos-group  0:
    q-size: 243200, MTU: 1538
    drop-type: drop, xon: 0, xoff: 1520
    Statistics:
        Pkts received over the port             : 0
        Ucast pkts sent to the cross-bar        : 0
        Mcast pkts sent to the cross-bar        : 0
        Ucast pkts received from the cross-bar  : 0
        Pkts sent to the port                   : 0
        Pkts discarded on ingress               : 0
        Per-priority-pause status               : Rx (Inactive), Tx (Inactive)

qos-group  1:
    q-size: 76800, MTU: 2240
    drop-type: no-drop, xon: 128, xoff: 240

    Statistics:
        Pkts received over the port             : 0
        Ucast pkts sent to the cross-bar        : 0
        Mcast pkts sent to the cross-bar        : 0
        Ucast pkts received from the cross-bar  : 0
        Pkts sent to the port                   : 0
        Pkts discarded on ingress               : 0
        Per-priority-pause status               : Rx (Inactive), Tx (Inactive)

Total Multicast crossbar statistics:
    Mcast pkts received from the cross-bar      : 0

Thanks

Hatim Badr

Hatim, thank you for all your time and effort. This is indeed a tedious discussion to have in email because its very detailed. If you are willing to chat on the phone, I will give you my number, No worries.

Lets please step away from talk about the product for a moment. Yes, I am interested in knowing how the Nexus works, but not because I have one to configure but because it provides a good example of how the open standard technology is implemented. However, at this time I am trying to have a vendor-agnostic discussion about 802.1Qbb itself and its application so that I can understand its nuiances.

In general then, PFC, as an open standard, can PAUSE any type of traffic, not just FCoE. You simply have to inform the switch which CoS should get PAUSEd (in Cisco, that is called the no-drop System Class of traffic, which can contain several different classes of traffic with different CoS values). Furthermore, by pausing the traffic, you are making it lossless. CORRECT?

I needed to confirm those concepts.

So, now talking about the N5K, you are saying that the no-drop system class (which by default on the Nexus 5010 and 20 includes FCoE at CoS 3) is given its own set of buffers that only that system class uses? If thats the case, then THAT is the answer to my question about throttling IP traffic and my worry that IP will use all the available buffers. You have to understand that I dont want to PAUSE IP traffic because I want to make it lossles. My concern, as I stated before, is that the IP traffic may hog up all ther buffer space, leaving FCoE traffic out in the cold - in other words, leaving FCoE in a constant state of PAUSE. Understand? That is why I asked if IP traffic can be PAUSEd!

So, if the answer to that is that FCoE traffic will NOT compete with IP traffic because it will get its own set of buffers that IP will NOT be allowed to use, then that is the answer. Is that the answer?

Regards

Hi Ex-Engineer,

Please see my answers inline

In general then, PFC, as an open standard, can PAUSE *any* type of traffic, not just FCoE. You simply have to inform the switch which CoS should get PAUSEd (in Cisco, that is called the no-drop System Class of traffic, which can contain several different classes of traffic with different CoS values).

HB: You are right PFC can pause any traffic class but for our discussion we used FCOE as the application which will utilize PFC standard. Any

Furthermore, by pausing the traffic, you are making it lossless. *CORRECT?*

HB: Correct

So, if the answer to that is that FCoE traffic will NOT compete with IP traffic because it will get its own set of buffers that IP will NOT be allowed to use, then that is the answer. Is that the answer?

HB: Yes by default (in 5010 and 5020) the only no-drop class is FCOE Class and it has it dedicated ingress buffer. If you configure other no-drop  classes they will be allocated dedicated buffer as well.

Please note: For the Cisco Nexus 5548 switch, the class-fcoe is not automatically created. Before you enable FCoE on the Cisco Nexus 5548 switch running Cisco NX-OS Release 5.0(2)N1(1), you must enable class-fcoe in the three types of qos policies:

I hope that clarifies it and thanks again for the valuable  discussion.

Hatim Badr

Hatim, marhaba!

Sorry I disappeared. I have been so busy.

Thank you VERY much for all your help and guidance. I greatly appreciate it. It is very kind of you to take time to read my long, tedious questions and answer them. Unfortunately, I have no choice but to bug you on here because there are VERY FEW people who have even heard of what we are talking about now. If I went to my colleagues from my old job (some of them CCIEs), I guarantee you they would be looking at me like Im speaking Chinese to them. They have N5Ks deployed, but only as 10G switches with vPC, nothing else. No FCoE, no PFC, no ETS - nada!

Anyway, I have a few follow-up questions, please...Hatim, please read the questions carefully because it is crucial to my understanding of what's going on. Please make sure I am using the correct terminology and describing things correctly, my friend.

1.) You mentioned earlier that each class of traffic that you assign to a no-drop system class will get its own set of buffers, thereby eliminating contention of buffer resources from the drop system class. So, for the default setup, in which FCoE is the only class of traffic that is no-drop, does that mean that FCoE gets half the 480KB interface buffer space, and that the no-drop system class gets the rest? In other words, is it a 50-50 split of the buffers? And what if another traffic class is added to the no-drop system class - lets say its iSCSI traffic, will the iSCSI traffic get its equal share of the buffer space, too, such that now FCoE gets 33%, iSCSI 33% and the rest of the traffic in the drop class 33%?

2.) Using the language found in the IEEE ETS standard itself, they use the  word "priority group" (PG) to differentiate the different types of  traffic. Do those priority groups map to Cisco's system classes? I think  they do. Cisco's PFC implementation speaks of traffic that belongs to  Sytem Classes drop and no-drop, and each of those clases  can include traffic types from several different priority classes (CoS  values). This is much the same way that ETS refers to priority groups,  in which each group can consist of traffic belonging to several  different classes of service, too. Thoughts?

3.) If you leave FCoE in the no-drop system class and leave everything else at the default drop system class, does this mean that, from the perspective of ETS,  each system class is automatically assigned a minimum bandwidth of 50% -  with the ability, of course, to leverage any unused bandwidth that the  other system class may not be using at any given time? Moreover, is that  bandwidth allotment configurable? For example, can you assign only 40%  to the no-drop class and 60% to the drop class? Moreover, once again, lets say you configure another lossless traffic type, like iSCSI, and add it to the no-drop system class - will ETS distribute bandwidth to iSCSI such that FCoE gets 33%, iSCSI gets 33% and the rest gets 33%.

4.)  This isn't a question but more of a statement that I need validated by you. It seems that even though ETS can allow several different traffic priorities (CoS values) to be assigned to a single priority group, the bandwidth allocation mechanism of ETS does NOT assign bandwidth on a CoS level, but only on a priority group level, which is why the standard recommends that traffic classes with the same traffic congestion management requirements be placed in the same priority group.

What I am trying to do is take my understanding of PFC to the next level by understanding its interaction with ETS, since they both work together to provide a lossless fabric.

I'll attach a copy of the IEEE ETS standard doc. I think it will be useful in case you have not read it in a while.

Hatim,are you in NY?? If so, I owe you a steak dinner with all the fixins - ehem, Halal, of course.

Thanks!

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: