cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
3461
Views
0
Helpful
18
Replies

General impressions on UCS?

rrfield
Level 1
Level 1

So we were "gifted" a UCS C250 M2 server.  Very nice of our Cisco Rep, who has been very good to us.

The server just got RMA'd due to a motherboard issue diagnosed by a remote TAC engineer.

How often does this happen?

During our TAC case, the first engineer did not seem to want to really troubleshoot the problem, rather just said "replace it".  An onsite tech came to swap parts (which we could have done, but whatever).  He was a contractor and had never worked on a UCS server before.  A few issues did arise and he had to call his tech guy at the office, who tried to get a hold of TAC to help out.  This took hours.  When he did get ahold of a TAC, the engineer "helping" us was very distracted, didn't ask any questions, and was very short and snarky.  I had a few followup questions, to which he answered with a link that had very little to do with the problem at hand.  We are still emailing back and forth, albeit very sproatically, trying to get a simple question answered (which was not answered properly because he didn't ask WHAT MODEL OF SERVER WE ARE RUNNING).

We are 0 for 1 on hardware and 0 for 1 on acceptable TAC support.

We are cutting a P.O. to upgrade our ESXi cluster within the next 6  weeks and want to give UCS a fair shake, but obviously our experience  has not been a good one so far.

Is this normal?  If so then we will stick with HP servers.

18 Replies 18

richard.pugh
Level 1
Level 1

Wow, sorry for the experiences you have received, that is just terrible. As I am not a Cisco employee rather a user and reseller, I can honestly tell you this is an experience that I have never seen before and can honestly say this will probably never happen again. Don't let this one incident influence your overall opinion. UCS is a great hardware solution and Cisco has a great TAC team. I would advise that you contact the vendor from who you purchased this through or your Cisco Account Manager and make him/her aware of this and let them escalate this internally or ask the TAC to speak with a support manager to let them know the issue at hand.

DAVE GENTON
Level 2
Level 2

I wont comment on the support side but myself have not had TAC support like that in the past 15 years or so for sure, I would notify a manger there, and it WILL be rectified.

  That done, for the UCS this is not typical no.  I have been specializing in these deployments and traveling the USA every week and will be doing so at least until the end of the year.  I have seen about 3 RMA's I can recall, and all of them have been C series Chassis, none of them on the B series side (knock on wood) so far.   On the C series 1 was a MB like you said, and the 2 others were both related to the RAID card specifically, which again is on the MB.

  While the C series has some catching up to do with the B series on firmware for management and general bug fixes, those are my only rma's from hundreds of physical units shipped and installed.

  You mention comparing to HP servers.  When you get to the B series chassis based blade servers, their performance, functionality and managability is beyond compare.  That goes for HP, IBM and Dell all 3.

Good Luck!

d-

rrfield
Level 1
Level 1

I did contact our local Cisco Rep, who was responsive as always.  We are a Cisco shop on the network side (and mostly Cisco on the Security side) and have always had good luck with TAC.  I'm glad to hear this is an anomoly and not the norm.

Thanks for your input.

rrfield
Level 1
Level 1

I'm going to reply to my own post to make one correction - after reviewing our notes, I was mistaken the first engineer DID troubleshoot the problem.  The frustration came from the SECOND TAC engineer, not the first.

In any event, our local Cisco Rep has been on the ball and has engaged her UCS support engineer to oversee the case.  We've always had good luck on the network and security sides with Cisco.  UCS is new to us, and I know it's a new area for Cisco.  Hopefully our UCS experience will more closely resmeble what we've come to expect from Cisco going forward.

That's what us partners are here for !!  We have gone a long way to ensure UCS support is the best and personnally I have even been having Internal Cisco SE/CSE Engineers shadow my install and configure sessions for customers.  First this is a new world for Cisco and a transition is to be expected, we're talking 18 months now, certainly less than 2 years since they entered the server market for the FIRST time ever.  That said they have gone along way hardware and software/firmware features and bug fixes.  Again I cannot speak much to TAC support or online PAK licensing and that shouldnt be used to hang a specific product set either.  Those process span all Cisco products, not just UCS.

  I have not met a single customer yet saying they are "going back to HP" or Dell or whatever vendor they had prior ONCE they are familiar with UCS architecture and how it works and mostly UCSM interface and how to find/configure in there what you learned about the architecture.  Some names have changed, BMC to CIMC and new drivers are out there now that have resolved most nagging issues in the beginning.  Any one that prefers using 11 different applications in 11 windows to mange the 11 components affecting a single server blade, certainly is not proficient nor aware of most the tools available in the UCS Manager single pain of mgmt.  I have heard and seen these same things before, but it changes after the person gets SEVERAL hours on the UCSM over a few weeks.  Firmware mgmt and maintenance policies alone can give you back many hours each week to use as personal time or towards other projects.

  For instance, biggest complaint I hear and usually the first one is related to making changes in a service-profile that require a server reboot.  If the profile is associated to a server, teh server immediately reboots causing an outage for the time it takes to boot whatever OS is residing on it, or worse, hypervisor and many vm's to boot up afterwards.  This person simply doesnt realize that you can apply a maintenance policy in less than 5 minutes affecting hundreds of servers or more even.  That 5 minutes includes creating unique schedules and everything.  First tell default policy no immediate reboot, or create a scheduled or user ack'd policy, I do both since it takes less than 90 seconds.  User ack changes flags me changes are pending so I can click reboot when I please if I want to do it any time other than right now or immediately.  Then scheduled policy, I tell it to use my weekend maint. window policy which says sat 2am run maintenance.  Now any changes requiring reboot happen during 2am saturday/sunday window while I am now sleeping instead of rebooting servers and applying patches and/or changes.  Ok, so now I take that policy and I apply it to my service profile template, that template spawned 400 servers in my datacenter, instantly all 400 of those servers are updated to apply those changes during Sat 2am maint. window instantly, and I have still only taken up 4 of those 5 minutes I spoke of.  It took me longer to type this........Any changes can be put in there and couple it with host firmware packages and you can update drivers for NIC, adapters SCSI or what have you, CIMC(BMC), even the BIOS if you want o.  Create server pools and you have apply one driver version, bios version whatever you have to certain groups if they are dynamic, if not, even simpler........That's JUST maint. policy     I usually create template for nic's and hba's / local disk policies, raid etc during initial install.  Customer creates a SP by right clicking my template and DONE or create unique on their own, but each compentent can pull in a template instead of doing manual config, like ESXI vnic template, or windows 2008r2 template, and not only does the server boot and have all that information, ALL the vlan's WWN's and scsi initiators if boot from san, are in there and it's just DONE !  Add vlans to your network ??  lan tab, vnic template, manage vlan's, check 2 new ones to add or whatever, save, DONE.  Every NIC on every server created with that template already has visibility for new vlans, no reboot, no messages to see, it's done.

  How do you do these things in HP ??  Dual Fabric for FC and LAN ??  Failover ? in microseconds ??  How about the Cisco Palo Virtual Interface Card ?  Virtual Nic's hba's carved to certain vlan's, as trunk, failover a-b, b-a, active/active 20GB, full qos with 8 data lanes 2 able to be Losless, FC or FCoE having dedicated lanes ??  How is that done in HP server racks ?? I am very curious since they are usually just gettign powered off for replacement when I am arriving at customer site.  Have not seen one in production I could poke around and see how it works......Seen those cisco switch blades in them before though.....

ghuey
Level 1
Level 1

Our experience has been horrible, and let me say that on the network side of things we are a pretty large cisco customer (Tier 2 ISP)

We have eight servers so far, and six I started provisioning and four of those had hardware related issues.  My experience with TAC on the server side has been hands down the worst support experience of any vendor I have dealt with.

My last issue was a simply bad RAM DIMM.  Case was opened on 7/18...not resolved until 8/17.  For one simple stupid chip.  Wasted weeks with back and forth emails of logs and screenshots.  When my support engineer finally conceeded it was a bad chip, it took him over a week to get me a replacement chip.  When I called to inquire about the whereabouts I hear some "oh...oh...I forgot to click...on *unintelligible gibberish*...should show up in about ten days...thank you...gud bie".  Click.

Here is the best part, I get the chip in, we ship it out the next day, I go to lunch and come back to a voice mail from a terse lady with "Cisco Asset Recovery" asking me where the RMA'd bad chip is.  Really?  You take almost two weeks to get me a replacement part then have the nerve to call the day after I get the part asking for the bad part.

I cannot access most the tools they ask me to try because there is some issue with my account not being tied to a contract, yet they cannot tell me the name of our TAC admin that can fix it.

Concerning ESX.  We just rolled out a small Vsphere 4.1 farm and experienced a problem with all the ESXi hosts running on Cisco boxes.  Turns out the IPMI on the Cisco boxes were randomly filling up the logs on the ESXi hosts preventing any configuration changes to be written to disk.  Only solution was to disable IPMI logging on vm side of things, reset the ESXi services, then delete the logs form the disk.

We bit because of price, but it will not happen again.  Back to HP.

gballard
Level 1
Level 1

I hate to say this, but I am afraid of our UCS. I think it's a great system, but we've had some glitches that just really make me nervous. A lot of this may be lack of education on my part, but the fact of the matter is that all our eggs are in one basket. I spose it's a lot like a SAN which have plenty of glitches as well. Regardless, I think this converged model is the future.

We haven't gotten a lot of hardware, but out of two blades we ordered one was bad. I have a friend and he's had a lot of hardware issues on new orders. One of our consultants says that as far as TAC goes, it's pretty good but in his experience the hardware support side is lacking. I have had one hardware call w/ TAC and it went very well though.

Gentlemen,

On behalf of Cisco I do apologize for any negative experiences you may have had.  Coming from TAC myself, we do try our best to fix/replace/resolve issues as quickly and efficiently as possible.  Our partner communities is also quickly skilling up to assist with augmenting our own support teams and we're growing our own TAC teams to accomodate the fast growth of UCS.

If there are any outstanding concerns with the handling of your case, please feel free to private message me with your Cisco SR # and I'll gladly look into it for you and try to turn your experience around.

We greatly value our existing and new customers, and realy want to offer the best support experience that you each deserve.  Actions speak louder than words so I will take back your feedback and investigate accordingly.

Regards,

Robert

Our experience has not been terrible, but not been stellar either.  We have a UCS with one blade chassis with 4 blades.  One of the blades displayed memory errors fairly early on.  Called support and got much the same run around.  One the phone, emailing logs, emailing screenshots, moving DIMMs around, rebooting, upgrading HBA FW etc., etc.  Tech finally shipped a whole new banks worth of memory.  Put the new memory in, didn't fix the issue.  Tech shipped a new motherboard, seemed to fix the issue at first, no errors in UCS Manager, but blade is now not posting all the memory.  Get informational messages on several DIMMs that say "DIMM DIMM_A6 on server 1/3 has an invalid FRU".  Guessing something to do with the replacement memory he sent me.  The tech I got on the phone did not seem very knowledgeable on UCS.  For instance, he wanted me to update HBA FW (not really sure why), but it wouldn't let us, threw an error.  He seemed lost as to how to get around it.  The error seemed pretty self explanatory, saying it was part of a FW policy and couldn't be changed.  He said he would get back to me, but, after getting off the phone, I just removed the policy so I could upgrade it and went fine, but, of course, didn't fix the issue.  Have basically given up on this issue, it's only a POC environment, so not really concerned about it and not willing to invest any more time trying to fix it.

Just the other day, one of the other blades had a local hard drive failure.  We're booting off the SAN so aren't even using the local disk, so I just pulled it out and rebooted the server.  The error kept coming up in UCS manager and wouldn't go away.  I acknowledged it, but it still says the drive is bad, even though I pulled it out.  I've ended up resetting the CIMC to get it to finally clear.

So, four blades, have had hardware issues on two in just over a year of barely any actual use and my one experience with support did not go well.  I've heard good things about TAC from our network guys, so hopefully this was an anomaly.  Our local sales and tech guys have been very responsive, so that is good.

I have to say I have not had any training as of yet, so that might be part of my trepidation with UCS and as we get more familiar with it, might end up liking it, but as of now, little nervous about it.  Also nervous about how we're going to get all the teams (network, storage, server) to work in this common environment.  We're very siloed right now, which seems to be taboo if you listen to everyone out there, but it works really well for us now, the teams interact well, just stay in our areas.  Not sure how we're going to break down those silos and if there is even any real benefit to doing so.

Just a few thoughts, brand new UCS production gear is on the way, so we'll see how it goes.

Bob

Robert,

On that invalid FRU error: I got the same thing on a couple of batches of memory. It turns out the factory was packaging mismatched pairs of DIMMs togther. The only way to see them without opening up the blade and reading letters on the DIMMs was in a show tech Chassis all detail.

Ex.

B4 2A| 4|2|2010W16|Samsung|Samsung 00|IDT     61|5550 0C 003C 69 1|1333|M393B5170FH0-YH9

B5 2A| 4|2|2011W16|Samsung|Samsung 00|Inphi   11|5550 0C 003C 69 1|1333|M393B5170FH0-YH9

Harold,

Thanks for the info.  I've got the output from the show tech command, which file would I find the info in?

Thanks,

Bob

Bob,

RankMarginTest file has the above information.

It is located under

//var/nuova/BIOS/

B250 has strict requirements on memory population and the above output would be helpful to verify whether paired DIMMs are identical or not.

B230 with latest firmware will not have information as it does not have such stringent requirements.

HTH

Padma

Padma,

Thanks for the reply.  Sorry, you'll have to be a bit more specific as I am a newbie to UCS.  When I ran the tar file created by the show tech support command, but I don't see the file you are talking about in it the tar.  Are you talking about being on a SSH session to the FI?  Can you give me the exact command I need to run?

Thanks,

Bob

For new or specific support quesitons, I'd suggest to start a separate discussion.

Cheers,

Robert

Review Cisco Networking products for a $25 gift card