Ping times - how to determine if its the ISP or internal problem
--chris--
Member Posts: 1,518 ■■■■■□□□□□
in CCNA & CCENT
I am dealing with a network issue that comes and goes randomly. I might be leaving out details, but I dont want to bore anyone with details so I will cut to the chase.
The customer has a static public IP. If I ping from workstation A to Server A through the internal network, ping times are <1ms. If I ping from workstation A to the gateway, ping times are <1ms. If I ping from workstation A through this gateway to google.com I get pings times that vary from 40-1500 ms and occasional dropped packets.
I ran a tracert to google.com, picked out the first hop into the ISP network and started pinging that. This yields low, consistent ping times (avg 5 ms). I then pinged the next hop in the ISPs network, again low ping times around 5-8 ms. I then pinged the third hop in the ISPs network from the same workstation all the other ping tests were performed but now I am seeing pings that bounce between 40 to 100. If I ping the next hop, ping times really take off and start getting into the 200's. All said and done ping times are going up to 1500 and then timing out.
Does this sound like an ISP issue? Or did I make a leap in my logic somewhere?
The customer has a static public IP. If I ping from workstation A to Server A through the internal network, ping times are <1ms. If I ping from workstation A to the gateway, ping times are <1ms. If I ping from workstation A through this gateway to google.com I get pings times that vary from 40-1500 ms and occasional dropped packets.
I ran a tracert to google.com, picked out the first hop into the ISP network and started pinging that. This yields low, consistent ping times (avg 5 ms). I then pinged the next hop in the ISPs network, again low ping times around 5-8 ms. I then pinged the third hop in the ISPs network from the same workstation all the other ping tests were performed but now I am seeing pings that bounce between 40 to 100. If I ping the next hop, ping times really take off and start getting into the 200's. All said and done ping times are going up to 1500 and then timing out.
Does this sound like an ISP issue? Or did I make a leap in my logic somewhere?
Comments
-
MSP-IT Member Posts: 752 ■■■□□□□□□□It sounds like you're approaching it the same way I would. It appears that the packets are being dropped at the ISP.
Have you attempted talking to your ISP? -
--chris-- Member Posts: 1,518 ■■■■■□□□□□It sounds like you're approaching it the same way I would. It appears that the packets are being dropped at the ISP.
Have you attempted talking to your ISP?
They have made site visits 4 times since this started, every time they said its the internal network.
I just called them again and emailed the tech a screen shot of what I was seeing. He acknowledged my theory is probably correct and is escalating the issue. I am waiting on the next tier to contact me for (hopefully) resolution.
I am pretty sure I know what they were doing wrong on the site visits. The tech would ping the ISP gateway, get great ping times and say "yeah its an internal problem". Its also sporadic, so even if he did ping google, it would have to occur during one of the slow periods which would be pure luck.
Either way, I got the ISP to take ownership of this issue. Its a big win for me since this is a new customer I went out of my way to talk into getting a contract with us. Two other companies were unable to resolve this issue and its been on going for few months. Not to mention this single issue has consumed me over the past few days
For anyone that is curious, this is what I was seeing:
-
Heero Member Posts: 486Pingplotter is a better way to do what you are manually doing. If you see the increased latency on every hop after the first hop with the high latency, it is a pretty fair bet that the link between the last hop without the latency and the first hop with the latency is heavily congested (or some other issue). Take the pingplotter info and go to your ISP with it. They will be able to go look and see if there is an issue with that hop. If they refuse or tell you they don't see any issues, complain and escalate until you get someone that actually knows what they are doing.
-
--chris-- Member Posts: 1,518 ■■■■■□□□□□Pingplotter is a better way to do what you are manually doing. If you see the increased latency on every hop after the first hop with the high latency, it is a pretty fair bet that the link between the last hop without the latency and the first hop with the latency is heavily congested (or some other issue). Take the pingplotter info and go to your ISP with it. They will be able to go look and see if there is an issue with that hop. If they refuse or tell you they don't see any issues, complain and escalate until you get someone that actually knows what they are doing.
This is why I love TE. Thanks, I'll be working that into my routine. -
xnx Member Posts: 464 ■■■□□□□□□□Got to be a ISP problem without a doubtGetting There ...
Lab Equipment: Using Cisco CSRs and 4 Switches currently -
Ryuksapple84 Member Posts: 183Really liked how you approached this problem. It is definitely an ISP issue.Eating humble pie.
-
--chris-- Member Posts: 1,518 ■■■■■□□□□□Ryuksapple84 wrote: »Really liked how you approached this problem. It is definitely an ISP issue.
Thanks! They finally agree as well.
But the down side is, now the ball is in their court....and we wait. and wait... -
Deathmage Banned Posts: 2,496This might be helpful; but now that I work at a MSP that is a ISP I get the low-down on the internet.
TWC has been having major issues the past few days that is like a domino effect for other ISP's...
Something worth noting, see if you do a tracert and if any of the hops are owned by TWC.... -
--chris-- Member Posts: 1,518 ■■■■■□□□□□This might be helpful; but now that I work at a MSP that is a ISP I get the low-down on the internet.
TWC has been having major issues the past few days that is like a domino effect for other ISP's...
Something worth noting, see if you do a tracert and if any of the hops are owned by TWC....
Good to know. Another ISP in the area has been having issues, but not this one (at least system wide like that).
The ISP's tier 2 tech called me back and said he is seeing them bump into their ceiling for upload bandwidth. Hmmm....we setup a time on Monday to troubleshoot this with me on site.
They do very little online, no back ups, data transfers, etc....I should have more info Monday. -
--chris-- Member Posts: 1,518 ■■■■■□□□□□The ISP tech said he is pretty certain the issue is the result of the customer occasionally bumping into their upload ceiling. I don't know though...why would the ping times get worse after the first two hops into the ISPs network?
while I am skeptical about this I did a small test to see if this really could be the case. The customer allows employees (doctors, which if you have supported them you know how much they are coddled) to stream media to tablets on lunches and breaks. I got the owner to agree to cutting off wireless for the week to see if the issue gets better....which it did.
The other possibility is someone in another building / business was stealing the wireless and eating up the bandwidth. They are using wpa2 with a 16 character pass, but it was unchanged for at least two years. Is it likely a script kiddie could break wpa2?
Before i move on to the next step I would like some input from others. Is what the ISP tech suggest likely or possible? -
santaowns Member Posts: 366Do you have access to view the netflow via scrutinizer? Pings show some latency but if behind a firewall and qos it is less reliable as the firewall gets busy it drops pings. Same with qos it drops pings. Is this environment cisco?Check the wan interface on both ends. Centurylink, Sprint and twc have had major issues within the past week. Especially centurylink mpls... Don't ask how I know
-
bertieb Member Posts: 1,031 ■■■■■■□□□□Get some bandwidth stats from your ISP for the connection to prove your client is hitting the upload limit at the times you are having the problems. Ideally, you should be able to monitor the ISP traffic yourself too if they don't provide a portal where you can view it yourself - it'll cut down on the problem tennis between yourself and the ISP and you'll be quickly able to spot any contention.
Always go with the basis of proof, and if people 'think' it's 'X' causing the issue, get them to substantiate it. You're the customer here, don your best Judge's court dress and prove things beyond any reasonable doubt......
How I'd go with this:
1) Get bandwidth stats and confirmation of the maximum download/upload limits
2) If they are nowhere near the contractual bandwidth limits when you're seeing problems, kick the ISP (again) - be wary that stats can often be averaged, you'll be better off with a 'real time' view when you're seeing the problems which is why an internal monitoring solution would be beneficial if they can't provide these quickly.
3) If they are exceeding the limits, you'll need to track down what's causing it - my guess would be traffic from a device that shouldn't be on the network but a user (or several) who have the wireless key and are using tablets/phones to watch Netflix at lunchtime
It does concern me that four site visits have been undertaken by your ISP and they blame the internal network without providing any real evidence!? If that was me, I'd have been more than unimpressed because it smacks of incompetence, unprofessional-ism or they know something is amiss but wont fix it, or a combination of all.The trouble with quotes on the internet is that you can never tell if they are genuine - Abraham Lincoln -
--chris-- Member Posts: 1,518 ■■■■■□□□□□Do you have access to view the netflow via scrutinizer? Pings show some latency but if behind a firewall and qos it is less reliable as the firewall gets busy it drops pings. Same with qos it drops pings. Is this environment cisco?Check the wan interface on both ends. Centurylink, Sprint and twc have had major issues within the past week. Especially centurylink mpls... Don't ask how I know
With my employer, small businesses are our bread and butter. Not much as far as infrastructure goes here. They are a new customer to us that was left with this mess by another IT service company. Literally, just a Dlink residential gateway that is bridged to a Comcast business modem. The Dlink does the nat'ing, the server does the DHCP.
To your question:
1) No firewall/QoS, pretty straight shot with almost no hurdles.
2) We have been dealing with that issue with Centurylink as well! -
--chris-- Member Posts: 1,518 ■■■■■□□□□□Get some bandwidth stats from your ISP for the connection to prove your client is hitting the upload limit at the times you are having the problems. Ideally, you should be able to monitor the ISP traffic yourself too if they don't provide a portal where you can view it yourself - it'll cut down on the problem tennis between yourself and the ISP and you'll be quickly able to spot any contention.
Always go with the basis of proof, and if people 'think' it's 'X' causing the issue, get them to substantiate it. You're the customer here, don your best Judge's court dress and prove things beyond any reasonable doubt......
How I'd go with this:
1) Get bandwidth stats and confirmation of the maximum download/upload limits
2) If they are nowhere near the contractual bandwidth limits when you're seeing problems, kick the ISP (again) - be wary that stats can often be averaged, you'll be better off with a 'real time' view when you're seeing the problems which is why an internal monitoring solution would be beneficial if they can't provide these quickly.
3) If they are exceeding the limits, you'll need to track down what's causing it - my guess would be traffic from a device that shouldn't be on the network but a user (or several) who have the wireless key and are using tablets/phones to watch Netflix at lunchtime
It does concern me that four site visits have been undertaken by your ISP and they blame the internal network without providing any real evidence!? If that was me, I'd have been more than unimpressed because it smacks of incompetence, unprofessional-ism or they know something is amiss but wont fix it, or a combination of all.
I was surprised the ISP came out four times as well. But that was before we got involved.
You might find it hard to believe (sarcasm) but I have been emailing/calling the tech I worked with for three business days without a response. I am going to have to start over with another tech.
Thanks for the tips, I will be using these. They should be able to give me some sort of documentation to show proof of the bandwidth usage, regardless of how small the customer is right? Or are those types of features only available to larger ($$$) customers? -
ccnxjr Member Posts: 304 ■■■□□□□□□□
I've been guilty of blaming the ISP when we were actually hitting our service caps.
I'll add my vote to setting up some sort of real-time (or close to it) monitoring on the actual bandwidth usage.
It also helps if you know what the contractual usage is supposed to be.
However, the few times that I did call in during an event, they were able to identify whether or not we were hitting our service cap over the phone, without having to perform a site visit.
So, that part does indeed seem odd. (however it did require an escalation over the phone)
By the third instance we got real-time monitoring in place to be sure that was the case before calling in.
It's also important that this be setup on the ISP-facing interface, to avoid the (very plausible) excuse that it's the customer router that's dropping/loosing packets.
Just saw OP's responses.
Might be more difficult with a company like Comcast/Verizon , even if it is a business account.
Their service agreements are littered with conditionals and liability waivers.
I've learned a lot about the importance of reading the contract, no matter how boring or mundane it is!
They should still be able to see usage, however, the person you're on the phone with may not have that access .
Also, many of the service reps for companies like Verizon/Comcast/Time Warner are trained to shift responsibility to the customer and are not that good at actually solving problems. -
santaowns Member Posts: 366I realize that it is a small business but most if not all "Business" class services even cable (which we have some sites that use cable modems too even though we are a fortune 10 company now) have a guarantee on service availability and quality. By the way the centurylink issue is resolved.
Here is the information you need:
-Verify patching schedules on the office pcs (yes updates take alot of bw) set these to off hours only.
-It will be very hard to find bandwidth usage using a dlink, so keeping track of the software updates etc will be a necessity.
-Encourage your customers big or small to use business class equipment so that you can provide them with better service and they receive a better service from the equipment (also i bet your boss would love to sell it to them)
-Even a dlink can block a website, suggest to block bandwidth hog sites like netflix, hulu pandora spotify etc
-you need to know exactly what the system is being used for websites, any voip phones?
I cant think of much more on the spot but i know your catching the drift. I would finally suggest an upgrade to next tier or a better service metro-e or oc-192 haha jk, but you again get the idea. -
--chris-- Member Posts: 1,518 ■■■■■□□□□□
I've been guilty of blaming the ISP when we were actually hitting our service caps.
I'll add my vote to setting up some sort of real-time (or close to it) monitoring on the actual bandwidth usage.
It also helps if you know what the contractual usage is supposed to be.
However, the few times that I did call in during an event, they were able to identify whether or not we were hitting our service cap over the phone, without having to perform a site visit.
So, that part does indeed seem odd. (however it did require an escalation over the phone)
By the third instance we got real-time monitoring in place to be sure that was the case before calling in.
It's also important that this be setup on the ISP-facing interface, to avoid the (very plausible) excuse that it's the customer router that's dropping/loosing packets.
Just saw OP's responses.
Might be more difficult with a company like Comcast/Verizon , even if it is a business account.
Their service agreements are littered with conditionals and liability waivers.
I've learned a lot about the importance of reading the contract, no matter how boring or mundane it is!
They should still be able to see usage, however, the person you're on the phone with may not have that access .
Also, many of the service reps for companies like Verizon/Comcast/Time Warner are trained to shift responsibility to the customer and are not that good at actually solving problems.
I was looking into snort as a free method that could monitor bandwidth, but I never found anything suggesting it could actually measure bandwidth.
any suggestions for software/device? It doesn't need to be free/low cost if I could set it up and reuse it down the road.
the contractual limit is 3 Mbps, which I would believe is pretty easy to reach with a dozen devices streaming on breaks.
thanks! -
--chris-- Member Posts: 1,518 ■■■■■□□□□□I realize that it is a small business but most if not all "Business" class services even cable (which we have some sites that use cable modems too even though we are a fortune 10 company now) have a guarantee on service availability and quality. By the way the centurylink issue is resolved.
Here is the information you need:
-Verify patching schedules on the office pcs (yes updates take alot of bw) set these to off hours only.
-It will be very hard to find bandwidth usage using a dlink, so keeping track of the software updates etc will be a necessity.
-Encourage your customers big or small to use business class equipment so that you can provide them with better service and they receive a better service from the equipment (also i bet your boss would love to sell it to them)
-Even a dlink can block a website, suggest to block bandwidth hog sites like netflix, hulu pandora spotify etc
-you need to know exactly what the system is being used for websites, any voip phones?
I cant think of much more on the spot but i know your catching the drift. I would finally suggest an upgrade to next tier or a better service metro-e or oc-192 haha jk, but you again get the idea.
thanks for the list of "needs", this will be used
Even though all of our customers are small business almost all are on sonic walls, cyberoams, even some cisco. New customers aren't always brought up to speed on day one depending on budgets.
The real hurdle is an apathetic owner. At any rate, thanks a lot guys. I spoke with the customer again today and they are reporting nothing but good news so the theory of someone eating up bandwidth through illegal wireless use is possible. Although that's just a guess...as time goes by we will audit and upgrade their infrastructure and procedures and there will be fewer issues like this. -
ccnxjr Member Posts: 304 ■■■□□□□□□□We're using Cacti, however our edge router is a Juniper SRX, so we can configure SNMP on it.
So, we can track how much each interface is sending/receiving, trace the bandwidth hog down to the port level and unplug him!
Cacti lends itself well to monitoring all sorts of fun stats on a router/switch through SNMP polling.
I'm not sure if it'll work the same on a DLink.
Of course, if they were using a Meraki device you get all sorts of fun stats in the web portal.
However I can't confirm if interface bandwidth is one of them.
There's all sorts of good reasons for getting a Juniper/Cisco/Meraki (also Cisco?) enterprise grade router.
However might be worth a few minutes to ask the rep if they can give you an estimate of your current usage, as a by-the-way over the phone.
This is not an unreasonable request if they're claiming that your hitting the usage cap.
If they can't determine (or tell you) how much your using, then how do they know your hitting the usage cap?
Do they hook you guys up on some enchanted port? -
--chris-- Member Posts: 1,518 ■■■■■□□□□□We're using Cacti, however our edge router is a Juniper SRX, so we can configure SNMP on it.
So, we can track how much each interface is sending/receiving, trace the bandwidth hog down to the port level and unplug him!
Cacti lends itself well to monitoring all sorts of fun stats on a router/switch through SNMP polling.
I'm not sure if it'll work the same on a DLink.
Of course, if they were using a Meraki device you get all sorts of fun stats in the web portal.
However I can't confirm if interface bandwidth is one of them.
There's all sorts of good reasons for getting a Juniper/Cisco/Meraki (also Cisco?) enterprise grade router.
However might be worth a few minutes to ask the rep if they can give you an estimate of your current usage, as a by-the-way over the phone.
This is not an unreasonable request if they're claiming that your hitting the usage cap.
If they can't determine (or tell you) how much your using, then how do they know your hitting the usage cap?
Do they hook you guys up on some enchanted port?
Wow I forgot about Cacti! I helped set that up at an internship, along side a Zabbix setup. Unfortunately, I new so little about networking at the time I retained none of what I did.
I am going to call the 800 number tomorrow and push the issue with another rep. The current one has ignored every email/VM since we first spoke. Thanks guys.
So Cacti installs on a windows or *nix box, then...polls network gear via SNMP? So I would need enterprise/business class networking equipment to setup Cacti for monitoring right? I doubt the 4 port Dlink router/AP/switch is going to offer that.
But this is good to know for other clients, since most run business class stuff. -
ccnxjr Member Posts: 304 ■■■□□□□□□□Yeppers!
Cacti is pretty basic SNMP poller/grapher .
I use the term "Enterprise" mostly because I'm not sure whether or not SoHo equipment will allow for SNMP monitoring.
More importantly, bandwidth tracking per interface.
It might be worth a few minutes to google the make/model + snmp to see what your options are. -
--chris-- Member Posts: 1,518 ■■■■■□□□□□The ISP said they have no way to show me usage / bandwidth. I asked how was the tier 2 tech able to determine that's what the issue was...? They said they would have to call me back in 4 hours, he did not know and the ticket notes did not say. That was two days ago.
typical? -
Deathmage Banned Posts: 2,496The ISP said they have no way to show me usage / bandwidth. I asked how was the tier 2 tech able to determine that's what the issue was...? They said they would have to call me back in 4 hours, he did not know and the ticket notes did not say. That was two days ago.
typical?
who is your ISP? -
lsud00d Member Posts: 1,571Dude...there's no WAY an ISP doesn't have netflow enabled and monitored.
However, it is possible that they refuse to share that info with you. Can any NOC'ers/ISP techs comment on sharing netflow data with customers? -
--chris-- Member Posts: 1,518 ■■■■■□□□□□Dude...there's no WAY an ISP doesn't have netflow enabled and monitored.
However, it is possible that they refuse to share that info with you. Can any NOC'ers/ISP techs comment on sharing netflow data with customers?
I'm not surprised. Support is all over the place when dealing with these people, stories change every time I call back.
As far as changing ISP's, it could be suggested to the owner but making the case to a small business owner that just switched from another ISP to get better speeds wont be easy.
I don't know if there is a middle ground in ISPs. You either get the $60/month package or move to a enterprise level provider with costs starting at $400/month and guaranteed service levels.
I love supporting SMB's but I look forward to the day I can pull my hair out over large business issues.