Logs Oh So Important

the_Grinch · December 2014

I work in regulatory enforcement dealing in the IT side of the house (obviously). We set up a pretty extension SIEM for monitoring and each time I utilize it I realize how difficult things would be if we didn't force the logging requirements we have. Being able to have data to back up the things you say has made things vastly easier. Heed this my friends you can have a very limited budget and still have an amazing monitoring tool. With the right data making your case is much easier.

UnixGuy · December 2014

Good work, can you share an example of the benefits you got out of implementing a proper SIEM?

docrice · December 2014

Not only logs, but any additional metadata that helps reinforce your findings. NetFlow records or packet captures help make the case quite visibly when the timestamps matches up.

lsud00d · December 2014

All of these are great points, SIEM's or log aggregators are indispensable tools in any environment. Another equally, if not more important tool/ability is to be able to cross-correlate manually. I can't tell you how many times I've had to log ninja between different servers, appliances, OS's...anything possible in the mix, to try to understand what happened.

the_Grinch · December 2014

You guys nailed it! Currently we have a separate system for Netflow data, but we are going to implement it into our current SIEM to have everything in one searchable interface. To elaborate on our setup I'll have to cover what we are looking for/at.

1. File integrity - each system being monitored has been identified as having what we (along with the provider) deem as critical files. We have to be notified of any changes to these files and then test/approve the change before it is implemented
2. Logins - there are restrictions in regards to logging into environments, so we monitor each case of a login and logoff
3. Errors - errors (along with some other items) often point to larger issues so we look at a number of related items to this

We have OSSEC installed on each of the servers housing critical files and the web servers. This reports back to a central OSSEC server, which in turn has Logstash installed to arrogate the data and format it to a style we prefer. I say this because in the case of say Netflow we'll be tagging the data to the provider it is coming from. So while we can look at all the Netflow data as a whole, we will also be able to have a dedicated panel to just one provider. Once this data is arrogated (this is done on the fly, in real time) it is pumped into our Elasticsearch cluster. Utilizing Kibana, we wrote search queries for all the particular data we want to see.

As an example, I have a panel that shows every login to every server that we monitor. I also have a panel that has a breakdown of all the critical files that we monitor and if they change I see it on a bar graph outlining the number of changes detected. Once the Netflow is added, I'll be able to see bandwidth utilization, automatic whois data, geolocation on ip's, and display that data on the map. Also, with the integration of d3.js, we've been able to build maps of test Netflow data. One map automatically creates itself and shows you which IP's are talking to each other along with the outside world.

Now the above is just the scratch on the surface of what we are doing. It gives us the initial alert, but from there we dive a bit deeper. Say a file changes, I take the initial query that tells me that it happened and I input it into another window. This tells me all the specifics: the file name, current hash, previous hash, and what time the file change. I check this against our notification of approved changes (this is still manual, but I plan on changing this). If I see an approval, I confirm that the hashes matches and then I move on. If there isn't an approval then I dig deeper. I'll look at a larger window of time and see who logged in prior to the change. I'll also check to see what commands were run (on Linux boxes only).

As you can see, still a fairly manual process, but only when it comes to digging in deep. I will say it is sort of like the Matrix because once you dive down that rabbit hole you see how deep it is. Of note was when I had a engineer tell me that one provider was experiencing a slowdown they couldn't track down. I started searching and saw that one particular system had a very high amount of Nagios logins (what they used for monitoring system health). I was able to go back in time for a whole week and compare that particular hour on each day of that week to see what was normal for Nagios. It was clear that there was something up on this server because there was a much higher number of logins which usually means it is alerting more often because it is configured to check more often when an error is detected.

Cost wise, we are currently monitoring 250 servers with just three COTS servers (about $2500 a piece). Software was completely free and while it was a bit of a learning curve everything we needed was online. Open Source has quite literally saved us about $72000 in licenses alone.

darkerosxx · December 2014

Grinch, what software are you using for all of that?

the_Grinch · December 2014

OSSEC - File Integrity and Pulling Logs
Logstash - Parsing the logs/aggregating data into JSON format
Elasticsearch - Storing data and allows querying (also has built in redundancy thus 2 of our 3 servers could die and we can still get our data)
Kibana - Front End we use to query Elasticsearch - you can make all kinds of charts and graphs, we've also updated it to allow for exporting to CSV

We've been operating since March 27th and haven't archived any of our data yet. With 250 servers we are looking at 190 gigs including the replication/redundancy. I will say that currently we do not have the ability to get email based alerts. We are working on it, but given our environment the current setup works just fine for our purpose.

Kragster · December 2014

the_grinch: Thanks for sharing this info, it is a huge help. I'm looking at implementing some log management and siem tools this year with of course 0 budget for it. Just out of curiosity was there a reason for using physical boxes instead of virtual servers?

Logs Oh So Important

Comments