What a supposed CCIE is asking to do my core live network (cisco 6500 VSS)

DevilWAH · November 2014

OK this has gone way of topic (partly my fault). This issues is the lack of strategy and the answers that come from questioning his ideas.

But more importantly he is paid to figure out what questions to ask.

deth1k · November 2014

We've only seen one side of the story, the way you describe it is that he - CCIE was brought in to fix something you / your collegues couldn't yet you are doubting his abilities. Each tech has it's own aproach to a problem, it doesn't have to follow ITIL rules or whatever seems logical to you or anyone else. I can see you arguing your point however you know better how your network is setup and therefore you should try to get along with this guy and help him as much as you can rather than being offensive. I can see your point, he is being paid to do this, but don't forget he walked into this blindfolded whereas you know each corner of your network etc.

DevilWAH · November 2014

Reading back I agree that my posts might sound like I am dismissing the guy as a joke. But that is far from the truth. We have been working together for a while now and he is not with out skills or positives. This thread was started about a specific point / suggestion that he made that with some basic thought he would have noticed was not really a good idea.

Unfortunalty reading back it might sound like I am dismissing the guy out of hand, or not discussing this with him. We are still working togather and I still value his opinuiums.

What originated as a light hearted post about some poor advice I was given (like i am sure we have all been given and probable given at some point in our careers). It kind of went off topic a bit and got a bit more serious and towards a general view I have of some engineers I have felt with over the years.

So yes we have discussed it, we have agreeded that's it not the way forward, and I still am happy working with the guy. Is he the best engineer I have ever worked with? No. But he is far from the worst. If he was really that bad he would not still be on site.

deth1k · November 2014

sometimes we rant without disclosing full story, now it's clear though

DevilWAH · November 2014

Essendon wrote: »

@DevilWAH, quick question for ya, are all hosts contributing to the VSAN cluster? Apparently performance's poor if not all ESXi hosts in the cluster participate in the VSAN cluster. Also, from your other thread, have you had a chance to find out the PSP you've got going in ESXi? If it's fixed, changing to Round Robin might just do it because it seems like all IOs that are being sent down one link are smashing it resulting in poor performance.

Another question - is the 10GbE dedicated to VSAN and have you looked at what VSAN observer says about the situation?

And read this please > VMware vSphere Virtual SAN design considerations... and this for some performance related info >
Root cause analysis of my VSAN outage : vmware

Hi sorry for the long delay but a bit of an update.

they first moved to use a single 10gig NIC. as we had only 3 guests running and where only pushing 3-5% of a single nic it was decided to go this route to eliminate issues, and all hosts are part of teh Cluster.

So I decided to do a little test, running there test (iometer) on avdi client I was seeing 10-15ms latencies that they where suggesting was a network issue. Strangely using the other 10gig NIC on the same adapter going in to the same line card in the same switch. the same test on the same VDI was giving a latency of <2ms to an NFS store on a netapp's.

I also look a packet capture from the host hosting the VDI to see the VSAN traffic flows. VSAN runs on TCP and my network capture was showing a network Round trip time of <2ms MAX and <1ms average.

I am still waiting for a responce from them to explain the difference in the IOmeter and network response times.

Although they have just said that there is a known issue with VSAN and SSD flushing to Spindles causing bottle necks that they are ment to be getting a fix for.

Essendon · November 2014

What kind of SCSI controller's in use? Gotta think about controller queues too, some entry level ones have a queue depth of like 20-something. Kinda okay for light traffic, can bottom out when loaded.

gorebrush · November 2014

DevilWAH wrote: »

He was not asked to configure VSS, I did that months ago and its been working fine. he was asked to check out the network to see if it was the cause of poor performance on a storage issue. I would expect any one be they a CCENT or a CCIE that if they are looking at a network and they come across some thing they have no experienced before to check it out before they suggest making fundamental changes to the config/setup.

You are saying that if you came (paid) to troubleshoot an issue running across my core, and you sat down logged on and found that you were seeing both chassis presenting as a single logical switch and where told it was running as a VSS pair, you would not either take some time to read a whitepaper / documentation on the technology or at least ask me (the person who does know how to configure it) for the basics before claiming to have a solution.

I was asked to look at some MPLS the other day, my response was "I haven't worked with MPLS, let me take this away and have a play and I will get back to you". I don't mind someone not knowing, I don't like someone throwing ideas at a problem without knowing.

Sorry if my post came across incorrectly - absolutely I'd come and talk to you!

I run across this a lot in my day job - I'm always coming up against topologies I don't know. I'll go and read/learn about them before doing anything that might... break anything

lrb · November 2014

He could always be a CCIE in Security, DC, SP, Wireless, etc. Just because he is a CCIE in one area doesn't mean he has to know the whole Cisco portfolio or range of technologies.

DevilWAH · November 2014

Essendon wrote: »

What kind of SCSI controller's in use? Gotta think about controller queues too, some entry level ones have a queue depth of like 20-something. Kinda okay for light traffic, can bottom out when loaded.

Its the 710 which is 600 or some thing. I mentioned this to them "oh really, not come across that, will have a check", this from the Vmware experts, and its not like its a new issue, the queue depth has been around for ages.

just the feeling I get is mention anything like this I come across on the internet and it seems like its news to them. Not what I expect from a company claiming to be a leader in providing VMware solutions.

DevilWAH · November 2014

lrb wrote: »

He could always be a CCIE in Security, DC, SP, Wireless, etc. Just because he is a CCIE in one area doesn't mean he has to know the whole Cisco portfolio or range of technologies.

As has been the common thread here, no one expects people to know every thing, but if you are going to propose a major change to a live network. then you either find out about it. Or ask the person whos network it is. You don't put in an email "The next stage of trouble shooting is to make the following changes........". Becasue when you send an email like that around to 15 odd people, it does not look great when the customer who is paying you to look in to there network for them comes back saying.. "Umm you know that is not possible and would take the network down don't you??"

Coming up to me (desk was next to theres) and saying "I was thinking we want to segregate the traffic on the core, could we do that by putting in a separate fibre between the cores and pruning the vlans on to it...", and I would have replied "I don't think we can, VSS does not allow much configuration of the inter-switch links, so we would have to look in to that a bit I think..." then we could discuss VSS and if it was possible or not. Saying "we are going to do this" suggests he knows what he is doing. I have no issues with people not knowing it all, as long as they don't act like they do.

PS. As before in this thread he is a switch/route CCIE

it_consultant · November 2014

I have a similar setup in terms of having a virtual chassis and setting up iSCSI and I ran into the problem of jumbo frames. On my Brocade ethernet fabric I don't worry about the ISLs in terms of fragmented frames, Brocade set up a VMAX 40K with a VCS fabric and proved it out. So in that environment all I have to do is set up port by port access VLANs and set the MTU to 10K and have a cup of coffee. In my other environment I have an ironstack of 6610s and this hasn't been proven out by EMC. The ironstack is close to what the VSS is, sort of a multi-chassis trunk as opposed to a TRILL based fabric. My conclusion was that it was unwise to enable 10K frames on the ironstack because the uplink ports are really just low latency 40GB ethernet ports with special QOS considerations on the backplane whereas the VCS fabric uplinks are actual ISLs - similar to what you get in fiber channel networks.

All that being said, we simply purchased another 10GB switch for the site with the ironstack and have it running iSCSI traffic with 10K frames and all - works bloody great. I can pump out nearly 10GB of storage traffic through a Hyper-V virtual switch! This is the important thing, IMHO, of having good generalists. I could have told you straight away that if you intended on running iSCSI all of your networking concerns should be surrounding 4 things, jumbo frames and iSCSI multi-pathing. If you have MS hosts (which I understand this is a VMWARE network) then the third thing is multiple connected sessions if the storage system supports it. The fourth thing is perhaps getting NICs that have iSCSI offload capabilities, but that is getting less and less important nowadays since the chipsets on 10GB cards are really very good.

If we set a pure network engineer on this task they will only get the pure networking part (in this case jumbo frames, which the CCIE did correctly identify as a problem but it sounds like he didn't consider fragmentation) but not the rest of the stuff on the hosts and storage which make things run well.

DevilWAH · December 2014

The issue with VMware was not the network in the end, but the SSD disk that the company had spec'ed up not being "vmwares best practice"

There is a full explication on this post http://www.techexams.net/forums/virtualization/104576-vmware-horizion-view-isseus-2.html#post898798

So after months of pointing at the network, they never once provided any solid evidence that it was actually at fault, apart from saying "its not best practice", despite the same network running our own NFS storage system fine. In fact they never acknowledged it was not the network, just have gone very quite on that front.

not overly impressed with this company, they showed very little trouble shooting skills, got the idea it was the network and like a dog with a bone would not let go or entertain the possibility it was some thing else. It was not till they finely agreed to get vmware involved directly that VMware looked though the spec's they had designed for us and pointed out the SSD as the likely cause.

What a supposed CCIE is asking to do my core live network (cisco 6500 VSS)

Comments