Troubleshooting network performance

WiseWunWiseWun Member Posts: 285
I have a few end users complaining about network performance from time to time. I'll be going on-site to see exactly what is "slow". I've checked the circuit utilization and it's only 5%, latency is good. Also checked the interface connected to the PC, no sign of packet loss/congestion. Packet capture shows a few TCP resets/re-transmission (about 3%).

What I'm I missing here? What do you normally look for based on experience? Any suggestions better then this would help!
"If you’re not prepared to be wrong, you’ll never come up with anything original.” - Ken Robinson

Comments

  • silver145silver145 Member Posts: 265 ■■□□□□□□□□
    set up a few IP SLA's to track latency, jitter over X period of time and compare the results. Most users **** will not be loading fast enough hence they are complaining. If there are any real problems this should root them out from a basic viewpoint
  • RouteMyPacketRouteMyPacket Member Posts: 1,104
    What are they attempting to do? Browse Internet or Pull files from a file share? Two completely different things.

    Start with the basics

    1. How many users are affected? Any commonalities between them? Are they connected to the same switch, are they in the same IDF etc?
    2. Check "performance" at the PC level, check NIC settings, are they set to Auto Negotiate or no? Perform a traceroute to the Internet, do you see any high latency at any of the hops? You can run throughout tests from their machine to say a file server using something like NetCPS or LAN Speed Test

    3. Based on the above results, you can start digging into the switch/network etc.
    Modularity and Design Simplicity:

    Think of the 2:00 a.m. test—if you were awakened in the
    middle of the night because of a network problem and had to figure out the
    traffic flows in your network while you were half asleep, could you do it?
  • WiseWunWiseWun Member Posts: 285
    silver145 wrote: »
    set up a few IP SLA's to track latency, jitter over X period of time and compare the results. Most users **** will not be loading fast enough hence they are complaining. If there are any real problems this should root them out from a basic viewpoint

    Thats the problem, I don't think there is any baseline available. Just joined the company but I'll look into it. Thanks for the suggestion.
    What are they attempting to do? Browse Internet or Pull files from a file share? Two completely different things.
    Start with the basics

    1. How many users are affected? Any commonalities between them? Are they connected to the same switch, are they in the same IDF etc?
    2. Check "performance" at the PC level, check NIC settings, are they set to Auto Negotiate or no? Perform a traceroute to the Internet, do you see any high latency at any of the hops? You can run throughout tests from their machine to say a file server using something like NetCPS or LAN Speed Test

    3. Based on the above results, you can start digging into the switch/network etc.

    Mainly file share, email, and company intranet. They don't do much browsing but when "slowness" occurs, it takes time to open the page. Reported user mentions other staff experience slowness from time to time, usually in the AM when they first log in. They all work on the same floor and on the same switch/vlan.

    Checked Task Manager, CPU and memory is normal. Couldn't check NIC settings because I needed admin rights, but on the switch is set to auto negotiate (100Mb full duplex). Response time from traceroute was about 6ms from my workstation to their PC, each hop within the SP mpls showed no issues. Any traceroute/ping/nslookup to the Internet is blocked by the proxy.

    We have hundreds of sites, some small some large. We know some are over utlizing the circuit and they need to be upgraded. But its the sites that under utilize the circuit yet complain about performance. It's most likely related to the workstations and this is why I need to be on site.
    "If you’re not prepared to be wrong, you’ll never come up with anything original.” - Ken Robinson
  • rcsoar4funrcsoar4fun Member Posts: 103 ■■□□□□□□□□
    The other one to check is "show controllers". Physical wiring could be causing issues for the user.

    Might not be a bad idea to build an MRTG box and start looking at the interface traffic.
  • WiseWunWiseWun Member Posts: 285
    rcsoar4fun wrote: »
    The other one to check is "show controllers". Physical wiring could be causing issues for the user.

    Might not be a bad idea to build an MRTG box and start looking at the interface traffic.

    Interesting to see the output of this command, I'll let you know the results tomorrow. Not too sure about MRTG, I'll talk to my peers. Have you had success with such implementation?
    "If you’re not prepared to be wrong, you’ll never come up with anything original.” - Ken Robinson
  • rcsoar4funrcsoar4fun Member Posts: 103 ■■□□□□□□□□
    What hardware you running?

    MRTG is cheap and flexible. What is not to like? :)

    The problem with just looking at the interface is you never know what happens between the averages. I had an admin that decided to reindex Sharepoint every 5 minutes, which would toast two domain controllers/DNS servers for 30 seconds every 5 minutes. They wanted to know "What is wrong with the network". icon_rolleyes.gif

    Also, "show proc cpu sort" will tell you which processes are utilizing the most resources. "show proc cpu history" will show what the CPU was doing when the slowness occurred and if there is some sort of cyclic process causing issues. Any output drops on the server interface?

    Show controllers is very telling. For instance, if you are seeing collisions on a full duplex port, something is broke. Late collisions, something is really broke.
  • WiseWunWiseWun Member Posts: 285
    hehehe @ the sharepoint admin reindexing every 5 min, that's crazy! I think the default is 15 or 20 mins. Here's the show commands output. The switches are mostly 3560 but we will be upgrading soon.

    The output is from the interface which the reported user is connected to via Cisco IP phone. I've reset the counter 6.5 days ago and there's currently 5235 drops so I need to find out the reason for the discards, there aren't any CRC or collision errors. I notice output drops on all 19 switchports ranging from 150-6k, was advised to restart the switch because they thought it was faulty but that didn't help. Just did a traceroute and it was 1ms so the issue happens once in awhile.

    Last week I noticed 90% of the traffic captured was to/from corporate server (maybe an update server). I found out where the server was located but did't get a chance to look at the interface. I need to chase down these server guys, my previous company was a pure ITIL shop so every asset we had was under one database (CMDB) and right away I would know the contact person, its function, location, etc...maybe the issue is the VOIP phones? I'm going to try direct connect and bypass the phone.
    "If you’re not prepared to be wrong, you’ll never come up with anything original.” - Ken Robinson
  • it_consultantit_consultant Member Posts: 1,903
    WiseWun wrote: »
    I have a few end users complaining about network performance from time to time. I'll be going on-site to see exactly what is "slow". I've checked the circuit utilization and it's only 5%, latency is good. Also checked the interface connected to the PC, no sign of packet loss/congestion. Packet capture shows a few TCP resets/re-transmission (about 3%).

    What I'm I missing here? What do you normally look for based on experience? Any suggestions better then this would help!

    Everyone has good suggestions and I have a few:

    1 - Retransmits of 3% is higher than I would expect for the circuit utilization. Two questions, how fast is the circuit and have you divided the circuit between voice and data? If the latter is true then this might be a simple QOS fix.

    2 - When they are complaining about network performance, are they doing anything else on their computer. This happened to me from time to time, turns out they were daisy chained through an IP phone (therefore 10/100 instead of a gig) and they were also running Occularis and pulling in a stream from 72 high def cameras.

    3 - Moderate expectations. Your 10MB copper over ethernet corporate circuit is extremely high quality and better than home internet but users cannot and should not expect their facebook browsing to be faster at work than it is at home. 10MB corporate is a different thing than the 100/50 DOCSIS 3 cable connection.
  • rcsoar4funrcsoar4fun Member Posts: 103 ■■□□□□□□□□
    WiseWun wrote: »
    ..maybe the issue is the VOIP phones? I'm going to try direct connect and bypass the phone.

    Per Cisco this shouldn't matter. However, I have experienced these issues myself.
  • WiseWunWiseWun Member Posts: 285
    3% of the packets captured from the users workstation not the circuit. Sufficient bandwidth has been provisioned for voice traffic and the circuit is 100Mbps.

    I tracked the server interface, it's clean. Users aren`t complaining anymore so that takes the monkey off my back but I'm still curious to know and will continue to monitor. Thank you guys for chipping in.
    "If you’re not prepared to be wrong, you’ll never come up with anything original.” - Ken Robinson
Sign In or Register to comment.