What would cause this problem?

thomAZ · September 2009

Got a windows 2003 server connected to a cisco 3550 switch. The switch has very little configured, no vlans are used except 1, no vtp or cdp. The PC is plugged directly into the switch and when I ping the server I get about 8 straight replies then I get about 3 timeouts before it replies again. I changed cables and ports but with no success. What do you think would cause this?

fleck · September 2009

Uneducated guess but it could be refusing to ping back to prevent flooding? I don't think that's likely but I might as well guess at it.

Oh yeah is that a pattern or is it steady after those initial timeouts?

leefdaddy · September 2009

Have you verified the NIC is good? Does it not have this issue with another switch?

thomAZ · September 2009

fleck wrote: »

Uneducated guess but it could be refusing to ping back to prevent flooding? I don't think that's likely but I might as well guess at it.

Oh yeah is that a pattern or is it steady after those initial timeouts?

No it continues in that pattern not exactly but it intermittently drops packets.

thomAZ · September 2009

leefdaddy wrote: »

Have you verified the NIC is good? Does it not have this issue with another switch?

We have multiple servers connected to this switch as well as a couple of PCs. The other servers are fine. I switched ports and cables and still no good. We also rebooted the server just to see what happened, still nothing.

fleck · September 2009

Without any certs or Cisco-specific switch experience, if that was happening to me I would of course assume that there is a problem with the server itself. Whether a wired NIC would do that because of hardware defect or anything other than bad drivers or configuration I don't know. I would try doing the same steps in Safe Mode, then try connecting the server to another DCE and seeing if it behaves differently there. Then I would try reinstalling device drivers and network protocols, or just flat out reformat the server and see if there was some weird networking configuration problem, or some service attempting to run and killing the connection (though usually a service abruptly trying to send large amounts of data should slow down the ping latency and not kill it outright). Wondering where the Cisco certified guys are at right now

Bl8ckr0uter · September 2009

thomAZ wrote: »

Got a windows 2003 server connected to a cisco 3550 switch. The switch has very little configured, no vlans are used except 1, no vtp or cdp. The PC is plugged directly into the switch and when I ping the server I get about 8 straight replies then I get about 3 timeouts before it replies again. I changed cables and ports but with no success. What do you think would cause this?

can the server ping the pc ok? Have you used wireshark to check and see what kind of traffic is going at that time? How about tracert or traceroute to see if there are any areas of network slowness?

networker050184 · September 2009

fleck wrote: »

Wondering where the Cisco certified guys are at right now

Well it doesn't sound like a Cisco issue to me. If all other devices on the switch are working fine and multiple ports have the same issue I'd be confident in ruling out the switch.

There you go, the opinion of a Cisco certified guy

shednik · September 2009

networker050184 wrote: »

Well it doesn't sound like a Cisco issue to me. If all other devices on the switch are working fine and multiple ports have the same issue I'd be confident in ruling out the switch.

There you go, the opinion of a Cisco certified guy

Yea but who says we can trust you mr. networker

I'd have to agree as well so you have the vote from 2 Cisco certified guys!

leefdaddy · September 2009

thomAZ wrote: »

We have multiple servers connected to this switch as well as a couple of PCs. The other servers are fine. I switched ports and cables and still no good. We also rebooted the server just to see what happened, still nothing.

Then you haven't verified if the nic is good or not... try a new one, try some updated drivers... try something

thomAZ · September 2009

leefdaddy wrote: »

Then you haven't verified if the nic is good or not... try a new one, try some updated drivers... try something

I kinda figured it wasn't the switch but I don't know much about servers cause I have 3 years experience with them and absolutely no experience with servers. That said, with this particular server I do not have an account to log on with (if your in the military you would know what I mean). This server is remotely managed from about a couple thousands miles away (great idea right). The GS worker helping me from the remote end said he would give me a "limited, temporary" account tomorrow, so we'll see what I find. I wonder what he thinks I can do that he can't, besides swap out the NICS. Thanks everyone for all of your input, it has definitely helped.

Solaris_UNIX · September 2009

thomAZ wrote: »

Got a windows 2003 server connected to a cisco 3550 switch. The switch has very little configured, no vlans are used except 1, no vtp or cdp. The PC is plugged directly into the switch and when I ping the server I get about 8 straight replies then I get about 3 timeouts before it replies again. I changed cables and ports but with no success. What do you think would cause this?

This sounds like it might be a physical layer problem (i.e. a bad cable or NIC).

Do you have a Linux live DVD or live CD (like an Ubuntu DVD or a Knoppix or Slax / Slackware live CD)?

Boot up into the live CD / live DVD, add your ip address, subnet mask and default router / default gateway to get on the network, then do a "zero interval flood ping" against the Windows server using the "ping -f" switch in the GNU version of the ping command. If you're not familiar with the "flood ping" switch in the GNU / Linux version of ping, here is the ping( 8 ) man page for it that might help to explain what I'm talking about:

ping(

- Linux man page

You need to become "root" to get the level of access you need for low level network troubleshooting commands, and you usually want do add the "-f" switch to ping for a "flood ping" with no specific interval specified (i.e. a "zero interval") and then add the "-c" switch to flood the target server with a predetermined number of ICMP echo request packets. So if the server IP address was 123.45.67.89, you could do something like this:

$ su - root

# ping -f -c 99999 123.45.67.89

to launch a massive overwhelming flood of 99,999 ICMP echo request packets at the server (these packets should go out at a rate of at least 100 packets per second if not more). Then see what percentage of ICMP echo reply packet loss you get in your result. If you get a high percentage of packet loss in your result, then replace the cable and then repeat the test and then compare the results. If the problem persists at the exact same percentage of packet loss for several different cables, then try replacing the NIC on the server and try using a different switch port. Other tools such as ttcp and web100 are also useful for troubleshooting these kinds of problems. I think web100 was originally designed for high performance super-computing and you can find more about it here:

The Web100 Project

It's a good tool to use for troubleshooting performance issues at the physical and data link layers, but setting it up properly will usually require you to have a BSD or Linux "speed test" server up on the local network somewhere. You can run the web100 client on a Windows 2003 server using Cygwin (good luck trying to get management to let you install Cygwin on their Windows server though)

networker050184 · September 2009

Without access to the server, I'd put a span port up to prove the ICMP traffic is being sent from the host and being delivered to the server. If you do both sides you can easily verify where the traffic is being dropped. Then you can send that to the server guy and tell him to fix it.

thomAZ · September 2009

networker050184 wrote: »

Without access to the server, I'd put a span port up to prove the ICMP traffic is being sent from the host and being delivered to the server. If you do both sides you can easily verify where the traffic is being dropped. Then you can send that to the server guy and tell him to fix it.

Sorry for my lack of knowledge, span port?

Solaris_UNIX · September 2009

Another remote and unlikely possibility (which I thought I would mention anyway since you said you already swapped out the cable and tried a different port) might be that there is a duplex mismatch between the switch port and the NIC. Usually this happens with older CISCO switches when you have a quirky / buggy NIC and the auto-negotiation process fails so the CISCO switch port falls back to half duplex but the NIC is still trying to operate at full duplex. This kind of duplex "mis-match" can result in massive network performance degradation in servers.

Have someone check if the NIC on the Windows server is at half duplex or full duplex and make sure they also take note of what speed the server's NIC thinks it's running at (i.e. 10mbps or 100mbps or gigabit, etc.) Then log in to the CISCO switch. If you have a newer IOS version you can use this command:

show interfaces status

and it will give you a nice little print out of the speed and duplex settings on all of the different switch ports. If you have an older IOS version, that command won't work, so you'll have to do it the old fashioned "sh int Fa0/1" way and pipe to "begin" or "include" if you don't want to read through the whole Cisco phone book of network interface information.

Here's more info about duplex mismatches on Cisco switches:

https://cisco.hosted.jivesoftware.com/thread/4506

When you do a "show interfaces fa0/x" command, where x is the switch port that the server is connected to, look to see if the number of "runts" is incrementing over time. Also look to see if the number of collisions, late collisions, and CRC errors is incrementing over time as well as this might give you hints as to whether the problem is in the data link layer connection between the server and the switch port or if the problem is somewhere else in the network.

Here's more info about it from Cisco's official web site:

Troubleshooting Cisco Catalyst Switches to NIC Compatibility Issues - Cisco Systems

Troubleshooting Switch Port and Interface Problems - Cisco Systems

thomAZ · September 2009

Solaris_UNIX wrote: »

Another remote and unlikely possibility (which I thought I would mention anyway since you said you already swapped out the cable and tried a different port) might be that there is a duplex mismatch between the switch port and the NIC. Usually this happens with older CISCO switches when you have a quirky / buggy NIC and the auto-negotiation process fails so the CISCO switch port falls back to half duplex but the NIC is still trying to operate at full duplex. This kind of duplex "mis-match" can result in massive network performance degradation in servers.

Have someone check if the NIC on the Windows server is at half duplex or full duplex and make sure they also take note of what speed the server's NIC thinks it's running at (i.e. 10mbps or 100mbps or gigabit, etc.) Then log in to the CISCO switch. If you have a newer IOS version you can use this command:

show interfaces status

and it will give you a nice little print out of the speed and duplex settings on all of the different switch ports. If you have an older IOS version, that command won't work, so you'll have to do it the old fashioned "sh int Fa0/1" way and pipe to "begin" or "include" if you don't want to read through the whole Cisco phone book of network interface information.

Here's more info about duplex mismatches on Cisco switches:

https://cisco.hosted.jivesoftware.com/thread/4506

When you do a "show interfaces fa0/x" command, where x is the switch port that the server is connected to, look to see if the number of "runts" is incrementing over time. Also look to see if the number of collisions, late collisions, and CRC errors is incrementing over time as well as this might give you hints as to whether the problem is in the data link layer connection between the server and the switch port or if the problem is somewhere else in the network.

Here's more info about it from Cisco's official web site:

Troubleshooting Cisco Catalyst Switches to NIC Compatibility Issues - Cisco Systems

Troubleshooting Switch Port and Interface Problems - Cisco Systems

I did check this yesterday and manually set the port to 100/full, still no success. I'll keep digging....... ohh and no errors are on the port.....

networker050184 · September 2009

thomAZ wrote: »

Sorry for my lack of knowledge, span port?

Check here.

It will basically take all traffic going in/out of a port and push it out another port. You can hook up wireshark there and see all the communications.

thomAZ · September 2009

networker050184 wrote: »

Check here.

It will basically take all traffic going in/out of a port and push it out another port. You can hook up wireshark there and see all the communications.

ohh ok port monitoring....I know what you mean now, thanks.

Solaris_UNIX · September 2009

thomAZ wrote: »

I did check this yesterday and manually set the port to 100/full, still no success. I'll keep digging....... ohh and no errors are on the port.....

Ok, so there's no runts, no collisions, no late collisions and no CRC incrementing on the port that the server is on.

What about the port that the PC is on? Are there any runts or collisions / late collisions or CRC errors there? Log in to the Windows Server 2003 machine and type in this command:

netsh diag ping gateway

and type in the same command:

netsh diag ping gateway

on the PC as well and compare the output. Are you seeing the same amount of packet loss pinging the gateway / default router from both the server and the PC?

Also try typing in these commands:

netsh diag ping dns

and

netsh diag ping adapter

on the PC as well a few times and look for any packet loss (these "netsh diag" commands only work in Microsoft Windows XP and Windows Server 2003 AFAIK. For some reason Microsoft decided to remove the very useful "netsh diagnostics" context from Windows Vista and Server 2008, which is too bad because they made scripting and troubleshooting over the phone much easier).

If you're clever, you can write a batch file or script that just pings the gateway / default router and other hosts on the switch over and over again from both computers and checks for packet loss.

phoeneous · September 2009

Just for reference, if you have cli access to the switch then run the sh log command and paste the output here.

Solaris_UNIX · September 2009

Maybe another unlikely possibility is that the switch is being over-saturated with traffic? What you do see as your output for the "CPU utilization", "Load Meter", etc. using this command:

Switch> en

Switch# sh proc

?

thomAZ · September 2009

So I checked the server, even though I'm not a server guy(they're not in for a couple of days), and i found that the two network connectors are being teamed together. Now when one of he NICS are enabled the server works fine and all pings are well. When both NICS are enabled packets are dropped intermittently as mentioned before. I checked this at the end of the day so I will continue tomorrow with checking the team connection configuration. Has anyone else experienced this with team connections?

Ohh and to add, I enabled both NICS one at a time alternating both to see if maybe one of the nics are bad. They both work because if I leave either one of the NICS enabled it pings fine, just not both.

Turgon · September 2009

thomAZ wrote: »

So I checked the server, even though I'm not a server guy(they're not in for a couple of days), and i found that the two network connectors are being teamed together. Now when one of he NICS are enabled the server works fine and all pings are well. When both NICS are enabled packets are dropped intermittently as mentioned before. I checked this at the end of the day so I will continue tomorrow with checking the team connection configuration. Has anyone else experienced this with team connections?

Ohh and to add, I enabled both NICS one at a time alternating both to see if maybe one of the nics are bad. They both work because if I leave either one of the NICS enabled it pings fine, just not both.

Yes I wondered if you were using teamed NICS. I have seen numerous problems with teaming on Windows and Linux or Solaris, however it wasn't clear to me in this thread how many NICs were involved in this server.

Checkout these links for some insights..

IT Resource Center forums - Teaming NIC problems after Win 2003 upgrade on DL360

Network adapter teaming and server clustering

What would cause this problem?

Comments