Logging Stack Design Considerations
Hi,
Sorry if I got a little long winded. Your input is appreciated.
After another discussion (http://www.techexams.net/forums/off-topic/106145-what-some-logging-solutions-cisco-devices.html) with some comments the_grinch made in it and this thread (http://www.techexams.net/forums/off-topic/106591-logs-oh-so-important.html), I decided to take some time and spin up a VM to get the first two pieces of the ELK stack working. At the moment, Elasticsearch and Logstash are operational, collecting logs for three devices in my small home lab. I did run into a few gotchas during the setup but was able to work through them using Google. Overall, the process wasn't that bad but it does bring up some questions regarding the design of a logging stack.
During my searches I came across several variants for logging stacks and want to explore some of the pros and cons. Ultimately, simplicity is the main goal because at some point in the future, this system will be handed off to someone else and besides that, I don't want to spend 25 hours a week maintaining it either. Some variants I have seen use Redis in the middle, another I have seen uses rsyslog with logstash picking up from a file, then another is piping directly to logstash prior to being dumped into elasticsearch, and yet another option is using graylog2 to sit in-between the endpoint and elasticsearch.
When I first started digging into this, I hadn't thought about an overall design. I was merely interested in getting something to work. Now that I have a very simple install working, I started thinking, how would this be implemented in my enterprise network? Plus, if you add in the ideas suggested to implement OSSEC on the servers and potentially wanting to use TLS for shipping logs when possible, a seemingly simple endeavour gets a little complicated.
Just as a reminder, I would be looking at logging for around 350+ servers, which include; Windows, Linux, AIX, UCS, ESXi hosts, netscalers, and all ILO and DRAC messages. Then on top of that, some Apache logs, some IIS logs, email logs from Zimbra, all WAN routers, ASA's, a firewall, and another 100+ internal networking devices. Down the road, there may be a need/interest for printers, power infrastructure, building automation systems, and some medical devices too.
The question at this point is, what is the best way to get logs from an endpoint (EP) to the Central Log Repository (CLR) then to the Elasticsearch Database Server (ESS) so they can be viewed by a Frontend UI Server (UIS). I see the overall setup as EP => CLR => ESS <=> UIS. I don't know if the ELK stack or what I am going to term as the EGL stack will be best. I think detailing some options as I see them may help determine the best fit or at least put it all out there for consideration from those of you much more versed in the subject than I.
# Endpoint
Using rsyslog on NIX seems like a no brainier. It is there by default on RHEL/CENTOS and really doesn't need much to get it shipping logs, plus it supports encrypting the connection. For Windows the options are less obvious but it seems that using NXLOG to ship eventlogs is the best choice. It also supports TLS encryption. Most everything else will support sending to rsyslog out of the box. However, the ability of everything else to encrypt logs will have to be determined later.
I have to admit, I am not really familiar with OSSESC's logging options. How would using OSSEC fit into the mix? Would OSSEC **** logs to the local filesystem, which are then shipped via normal methods or would OSSEC take over the shipping duties, negating the need for NXLOG on Windows? Also, how easy would it be to integrate OSSEC after setting up something else to ship logs?
Another question related to the endpoint; what should be logged? Are you logging everything or just certain things to cut down on network chatter?
# Log Repository
One big question I have is, is it better to ship to a plain old rsyslog server then use logstash/graylog2 to pick up the logs from there or use logstash/graylog2 directly for parsing and dumping to the backend? Also, how are people dealing with HA?
# Elasticsearch backend
I think this is the one thing I don't have any questions about other than resource allocation, which I can figure out using trial and error. however, your input is still welcome.
#User Interface
Since I have not gotten to installing Kibana yet, I can only go off the articles I have read. it looks pretty amazing and seems very versatile except for one thing. I cannot find anything regarding setting up multiple logins and locking access down. GL seems to have that covered. I am going to setup both for review but are there any other obvious pros and cons I should know about?
Again, thanks for any input. I appreciate you taking the time to read this.
Regards,
Sorry if I got a little long winded. Your input is appreciated.
After another discussion (http://www.techexams.net/forums/off-topic/106145-what-some-logging-solutions-cisco-devices.html) with some comments the_grinch made in it and this thread (http://www.techexams.net/forums/off-topic/106591-logs-oh-so-important.html), I decided to take some time and spin up a VM to get the first two pieces of the ELK stack working. At the moment, Elasticsearch and Logstash are operational, collecting logs for three devices in my small home lab. I did run into a few gotchas during the setup but was able to work through them using Google. Overall, the process wasn't that bad but it does bring up some questions regarding the design of a logging stack.
During my searches I came across several variants for logging stacks and want to explore some of the pros and cons. Ultimately, simplicity is the main goal because at some point in the future, this system will be handed off to someone else and besides that, I don't want to spend 25 hours a week maintaining it either. Some variants I have seen use Redis in the middle, another I have seen uses rsyslog with logstash picking up from a file, then another is piping directly to logstash prior to being dumped into elasticsearch, and yet another option is using graylog2 to sit in-between the endpoint and elasticsearch.
When I first started digging into this, I hadn't thought about an overall design. I was merely interested in getting something to work. Now that I have a very simple install working, I started thinking, how would this be implemented in my enterprise network? Plus, if you add in the ideas suggested to implement OSSEC on the servers and potentially wanting to use TLS for shipping logs when possible, a seemingly simple endeavour gets a little complicated.
Just as a reminder, I would be looking at logging for around 350+ servers, which include; Windows, Linux, AIX, UCS, ESXi hosts, netscalers, and all ILO and DRAC messages. Then on top of that, some Apache logs, some IIS logs, email logs from Zimbra, all WAN routers, ASA's, a firewall, and another 100+ internal networking devices. Down the road, there may be a need/interest for printers, power infrastructure, building automation systems, and some medical devices too.
The question at this point is, what is the best way to get logs from an endpoint (EP) to the Central Log Repository (CLR) then to the Elasticsearch Database Server (ESS) so they can be viewed by a Frontend UI Server (UIS). I see the overall setup as EP => CLR => ESS <=> UIS. I don't know if the ELK stack or what I am going to term as the EGL stack will be best. I think detailing some options as I see them may help determine the best fit or at least put it all out there for consideration from those of you much more versed in the subject than I.
# Endpoint
Using rsyslog on NIX seems like a no brainier. It is there by default on RHEL/CENTOS and really doesn't need much to get it shipping logs, plus it supports encrypting the connection. For Windows the options are less obvious but it seems that using NXLOG to ship eventlogs is the best choice. It also supports TLS encryption. Most everything else will support sending to rsyslog out of the box. However, the ability of everything else to encrypt logs will have to be determined later.
I have to admit, I am not really familiar with OSSESC's logging options. How would using OSSEC fit into the mix? Would OSSEC **** logs to the local filesystem, which are then shipped via normal methods or would OSSEC take over the shipping duties, negating the need for NXLOG on Windows? Also, how easy would it be to integrate OSSEC after setting up something else to ship logs?
Another question related to the endpoint; what should be logged? Are you logging everything or just certain things to cut down on network chatter?
# Log Repository
One big question I have is, is it better to ship to a plain old rsyslog server then use logstash/graylog2 to pick up the logs from there or use logstash/graylog2 directly for parsing and dumping to the backend? Also, how are people dealing with HA?
# Elasticsearch backend
I think this is the one thing I don't have any questions about other than resource allocation, which I can figure out using trial and error. however, your input is still welcome.
#User Interface
Since I have not gotten to installing Kibana yet, I can only go off the articles I have read. it looks pretty amazing and seems very versatile except for one thing. I cannot find anything regarding setting up multiple logins and locking access down. GL seems to have that covered. I am going to setup both for review but are there any other obvious pros and cons I should know about?
Again, thanks for any input. I appreciate you taking the time to read this.
Regards,
“I do not seek answers, but rather to understand the question.”
Comments
-
the_Grinch Member Posts: 4,165 ■■■■■■■■■■OSSEC collects the information in real time and passes it back to the central manager for storing of logs/shipping them elsewhere. Logstash in turn parses these logs and then pushes them into Elasticsearch. The integration for us was pretty easy, as for what you can get, I believe it's just about anything you choose to have it monitor. We monitor security logs, logins/logoffs, file integrity, and Apache logs. So technically, for your Linux boxes you could stick the rsyslog and then on the Windows boxes use OSSEC.
Basically, Logstash will listen for OSSEC/rsyslog and then parse that into JSON format to be dumped directly to Elasticsearch. As for making it enterprise ready, it is simple enough. Logstash will cluster just like Elasticsearch and as you need more you can add additional Logstash nodes. To give you an idea, we have about 200 servers reporting to a single instance of Logstash.
Ah, you've found the "flaw" as it were of Kibana. We went to a two course on Elasticsearch in May and at that time Elasticsearch's stance was security was the problem of those who were implementing it. Fast forward to today and after a huge infusion of cash they've learned they had to do something. Right now they are developing a product called Shield which will give you the type of security you are looking for. Logins, auditing, and controlling who can access what. Shield costs money, which is the only part that sucks about it. As of right now we have logins setup to Kibana via Apache. We've been looking for something a bit better and at the moment it looks like implementing Nginx is the way to go. It will take a bit of work, but you could then get just about everything you get in Shield without the costs.
There are other measures you could implement in order to secure the setup. Use iptables to limit who can access Kibana (on top of logins via Apache or Nginx). Elasticsearch can be setup to only cluster to certain ip addresses instead of multicast. You also can setup a different cluster name so someone cannot randomly start a node and connect. Logstash can also be setup so that curl commands can not be run from those nodes in the event you don't want someone using Logstash to query Elasticsearch.WIP:
PHP
Kotlin
Intro to Discrete Math
Programming Languages
Work stuff -
alias454 Member Posts: 648 ■■■■□□□□□□I just wanted to update this thread and share my progress. I have to reiterate that logging is not as simple as it looks. At least, not in the world of open-source logging stacks. To sum up my quest, I setup separate stacks for testing; one using the ELK components and one using graylog2 with an elasticsearch backend. In both cases, the POC’s were easy enough to get up and running. However, the ELK stack is by far the leader in ease of setup but there are a few things I liked about graylog2 that ultimately swayed my decision to go that route.
Along the way, I also decided to document my setup, which has become a very long document. I added the TOC for the document if you are interested in the 10,000 foot view. Thanks for your input along the way.
Table of Contents
Logging Infrastructure Overview
Design Considerations
Selected Design
Design Overview
Node Details
Build procedures
Partitioning layout
For ES data nodes, ES master nodes, and rsyslog nodes
For OSSEC node (Optional)
Install prerequisite apps
Setup cluster nodes
Setup elasticsearch on three nodes
Additional Setup for master node
Setup mongodb on master node
Setup graylog2-server on master node
Setup graylog2-web UI on master node
Setup rsyslog nodes
Setup OSSEC node (Optional)
Appendix A: contents for log4j.xml
Appendix B: contents graylog-server init script
Appendix C: contents graylog-web init script“I do not seek answers, but rather to understand the question.” -
the_Grinch Member Posts: 4,165 ■■■■■■■■■■I would definitely like to read your document. Obviously the joy of open source is being able to choose the software that suits the job. Would definitely like to see graylog in action.WIP:
PHP
Kotlin
Intro to Discrete Math
Programming Languages
Work stuff -
phoeneous Member Posts: 2,333 ■■■■■■■□□□Logging will be one of my continued projects for 2015. A few months ago I setup rsyslogd+loganalyser on a centos box for poc to management. I actually might stick with it, it does what I need it to do and I like the interface. Havent added any servers nodes to it yet, just cisco gear.
-
the_Grinch Member Posts: 4,165 ■■■■■■■■■■More then one way to skin a cat. One thing I've learned, from a former team mate (he was promoted), was to start thinking five years ahead. What works now might be great in the short term, but as with most things, when people learn about it they want to use it. Case in point we started down the Hadoop road for some of the stuff we do and someone from another department caught wind of it. Now they'd like use to start working on their data, thankfully my former team mate planned with this and we are ready for it.WIP:
PHP
Kotlin
Intro to Discrete Math
Programming Languages
Work stuff -
alias454 Member Posts: 648 ■■■■□□□□□□@ grinch: I would be happy to let you read it as I wouldn't mind getting some outside eyes on it anyway. I have some cleanup to do and a little bit of work to finish. However, as soon as that is done, I can post it here.
@ phoeneous: I thought about going that route too and I might scrap all of this and do that instead (you never know). I catch myself wondering if I am making this too complicated but I think if we are going to capture the data, we should see if we can use it for some deeper insights into our environment.“I do not seek answers, but rather to understand the question.” -
alias454 Member Posts: 648 ■■■■□□□□□□Apparently I cannot upload odt or docx files. it's 31 pages in length so a copy/paste into the forum won't cut it. I plan on turning this into a couple of wiki pages or a blog post (I haven't decided) but if you don't want to wait, I would be willing to email it if you can PM me your email.
Regards“I do not seek answers, but rather to understand the question.” -
alias454 Member Posts: 648 ■■■■□□□□□□Link to the blog posts Logging | Tech notes
There are ten parts starting at part 1 of course. The basic gist is to start from a minimal CENTOS install and follow all the steps in order.
Please feel free to alert me to any major errors or omissions. However, please remember, it is still a work in progress.
Regards“I do not seek answers, but rather to understand the question.”