by Justin Henderson, John Hubbard, Ismael Valenzuela
In this post, SANS instructors Justin Henderson, John Hubbard, and Ismael Valenzuela tackle some of the common questions they get from defenders looking to use their Security Information and Event Management (SIEM) platform as a high-impact detection tool.
If you are considering attending the Tactical Detection & Data Analytics Summit or have already registered, this post should serve as a good warmup, as many of these topics will be covered in-depth at the Summit on December 4-5, in Scottsdale, AZ.
What are the most common complaints you hear related to the use of SIEM in a Security Operations Center (SOC)?
I often hear folks state their SIEM is slow and that it cannot find anything useful. This occurs so much that the phrase “coffee break SIEM” has been coined. Searches taking minutes to complete is unacceptable. Careful planning and understanding how the search is happening can help prevent this. The real issue here is too many logs without emphasis on logs that matter. This leads to mountains of alerts, long triage times, and an inability to identify unauthorized activity. This is an issue with multiple causes: 1.) Either collecting too much or not having enough hardware which leads to slow query times (usually the former), 2.) Too much emphasis on log collection vs. detection, 3.) Lack of automation and context
The most common complaints have to do with the sheer number of alerts (a Damballa study indicates the average U.S. business has to deal with an average of 10,000 alerts a day); the lack of context (is this really a false positive or not?); and the lack of visibility (not having the logs you need to detect certain attacks, while having too many of the “other logs,” that is, the not-so-useful ones).
Why are security professionals so frustrated with their SIEM? Is it a technology problem?
As much as we all would like to blame vendor technology for SIEM failure, it’s almost never a technology issue. In my experience, frustration with SIEM is a result of staff being trained on a specific tool (insert your SIEM tool here) versus having training on what data to collect, how to collect it, how to enrich it with context, and then ultimately how to use it to catch the bad guys. This is why the SANS course SEC555: SIEM with Tactical Analytics was created and has seen tremendous success. We all too often forget SIEM is just a tool. It is not about the tool. It is about how you wield it.
As with many other frustrations, it’s a matter of misplaced expectations. When you’ve been told by the vendor that SIEM will meet all your detection needs, or that it is all you need for your SOC, frustration is going to come sooner rather than later. So no, it’s not a technology problem, but more of a cultural problem. SIEM is an important element in the SOC, but it’s only one of them. As with anything else in security, there’s no substitute for having the right vision, processes, and people in place.
While teaching, I hear many complaints about SIEMs from the speed of searches, to parsing complication, to difficulty visualizing and reporting information. If you look at recent surveys, this isn’t surprising; one found that only 48% of SIEM owners are satisfied with their purchase! That’s crazy, considering the lowest score of all airline companies comes in at 63% satisfaction.
What does that say about our SIEMs?? To me, it seems many of these issues can be remedied with one simple solution – training. Many SIEMs are extremely capable, but the features are not always straightforward to use. To make things worse, organizations seem willing to spend the hundreds of thousands of dollars on a SIEM, but then skip the analyst courses required to get the return on investment. This situation leads to a team that either must figure the system as best they can while trying to do their day job or a team that continues to use only the basic features they understand by exploring around on their own.
With a SIEM, you aren’t done once you’ve purchased the hardware. People need to be trained and given the time to actually absorb the training. I think there would be significantly less frustration if this were standard practice.
What are some of the quick wins you recommend to improve the effectiveness of a SIEM?
Where to start!? Honestly, one of the most effective things an organization can do is to apply a generous portion of filtering or limit the data it collects. Think of this as a hygiene issue. If you do not take a shower you will in an extreme case get sick and die, or at a minimum, you will stink. The same things will happen if you don’t filter out the dirt and grime that comes with your logs. Many times, less is more. Also, major quick wins can be achieved by implementing basic detection principals. Monitoring for simple things such as key Windows events can provide high-fidelity detects, low false positives, and minimal logs. This is not limited to Windows. The only reason this task is difficult is it is hard to know what data sources can fall under this category and be easy to implement.
My approach might sound a bit boring, but it’s highly effective, and I see it yielding positive results repeatedly when applied in a tactical way. It consists of breaking down the DETECTION game into smaller pieces, and focusing on the most critical ones first. On one side, we need to segment the problem into ZONES and determine the most prevalent threats for each of them. This is important because threats vary according to the zone. As an example, when looking at the DMZ, you definitely want to look at sql injection attempts and webshell implants. On the other side, in the LAN, you want to look at client-side threats like phishing or watering hole attacks. Now, for each of these scenarios – DMZ and client-side threats – look at the different PHASES of the attack chain, and your ability to detect activity related to each of them. A consistent approach like this allows you to find blind spots, determine what logs you need to bring to your SIEM (or discard), and document your detection efforts in the form of use cases.
My advice is to take a step back and look at the data you are collecting versus your most important and common use cases. Are you even collecting the information sources you need to make high fidelity detections of modern attacks? If not, this is what needs to change. Often it’s not that the SIEM can’t perform the job required, it’s that the data just isn’t available to wield the SIEM properly.
Organizationally, we should work on making the config changes required to pick up data sources that have become increasingly important like PowerShell logs, process creation logs, Exploit Guard, or Sysmon. Without some of these newer sources, your SIEM may be stuck collecting data that doesn’t have enough fidelity to detect dangerous conditions without frequent false positives and the data it would take to enrich it to improve the situation isn’t even available. Ensuring your data collection sources are in line with modern attacks, and using guides like MITRE ATT&CK to understand what you do need can be a huge step in the right direction.
What data sources can give me the best visibility and detection capabilities?
This is a tough one in that there are many awesome data sources. Let me start with my favorite: plain old DNS logs. Add a little bit of log enrichment and you have one of the most powerful ways to catch things like C2, phishing domains, and more. They are also a fantastic way to reduce false positives and limit alert fatigue.
Outside DNS there also is Windows endpoint logs. Not just servers but actual endpoints like laptops and desktops contain Windows events that are amazing at catching adversarial activity. And yet, this task of endpoint log collection does not have to be high volume. Tactical endpoint log collection goes a long way even for things like modern-day PowerShell attacks. Endpoint logs make this possible and fun to detect.
Without getting into writing a full blog about this, I want to mention that I also love things like flow data, augmented IDS alerts (enrich, enrich, enrich), and other network service logs like HTTP, DHCP, and ssl/x509 certificate information. Can we all just agree Security Onion/Bro for the win?
I wholeheartedly agree with Justin (always do), so nothing to add.
As Justin said – Bro is an outstanding source of information of all types. If there were a single monitoring tool I could use for network security monitoring, this would be it! Beyond that, I’m a huge fan of proxy logs…if you have them. Justin is correct that DNS is an outstanding source, but proxy logs or next-gen firewall logs that cut layer 7 level transaction data for outward bound traffic like HTTP can be an enormous help and go much further than what DNS alone can provide. DNS is probably the easiest and most bang for the buck you can get as a collection source, but gathering more detail can be even better when the option is available.
Does compliance dictate that I keep ALL logs for a given system?
No. No. And again, no. If this were true then everyone would be lying through their teeth. For example, most organizations believe they are compliant if they collect all Windows logs from Application, System, and Security event channels on Windows . Yet those are not “all logs” on a Windows system. There are hundreds more Windows event log channels and even more special event tracing logs. Some of these, like the PowerShell event channel, are highly effective at catching malicious and unauthorized activity. Yet this often is not included in your fulfillment of compliance via log collection.
A couple things are worth noting. One, you are not collecting all log sources and there are more things you should collect such as PowerShell logs. Two, a large percentage of organizations trying to fulfill compliance requirements are erring way too much on the side of caution and could benefit from applying filters to remove a generous portion of what they are collecting. Most compliance frameworks are aimed at things such as user-attributable data and focusing on the spirit of compliance. Again, no organization is collecting all logs. This is a data point that can help you get auditors to sign off that you are meeting compliance regulations. The process should be: “This is how I am fulfilling my compliance requirements: I am collecting these logs that help me track user activity and malicious and unauthorized activity. Here is what I am filtering out, as it is high volume and does not help with any of these things.”
How much time should people spend maintaining their SIEM?
This totally varies from organization to organization. But it mostly varies by headcount. I will put this plainly: if you are spending 80 percent of your time within a SIEM tool doing alert review and analysis, then you are on the right track. If you are an organization that is instead focusing heavily on collecting more data sources, applying patches, or running compliance reports, then your SIEM implementation may not be tactical.
Obviously, this percentage varies depending on whether you are not implementing a SIEM at all versus you having had a SIEM tool up and running for a couple of years.
Does scaling a SIEM have to be so costly?
I wish I had a better answer for this. If you are using a commercial SIEM tool and have a business requirement to only use commercial solutions, then the answer is likely yes. Many solutions are priced in a way that more logs equal higher costs. The discounts tend to not scale with your volume. Some vendors are learning to be more considerate of this, but many of the experiences I have had working with multiple clients has shown otherwise.
If you can augment your commercial solutions with open-source solutions, then the answer may be no. A lot of organizations are switching to what I call “dual stack” SIEM. This means you have one SIEM tool dedicated to compliance-type logs and one dedicated to tactical SIEM implementation. By using open-source solutions for one of these, you can cut costs and may be able to achieve more tactical objectives.
What about the use of Artificial Intelligence/Machine Learning? Is it mostly marketing buzzwords and hype, or is it really something organizations should start considering?
Please do not hurt me on this one. Let me start with the positive. Machine Learning (ML), user entity, and behavior analytics, Artificial Intelligence (AI), or any other automated anomaly analysis software is a boon to security. It can be extremely powerful when used in conjunction with cybersecurity domain expertise and mapped to the knowledge of an organization.
Now for the bad. The vast majority of these systems are costly and end up generating a tremendous number of false positives. Anomalies are not the same as alerts and should not be treated the same. These solutions sound awesome, but I categorize these types of tools as a maturity item. Consider these later after you have successful and tactical SIEM. Ninety-nine percent of organizations would be better off spending their time focusing on what data sources they can collect and use to catch adversarial activity in their environments instead of trying to purchase something that seems like it can do this automatically.
One of the smartest people I have ever worked with is a data scientist. He teaches folks how to learn and apply Machine Learning and all the other techniques being marketed, but he is honest and openly states that implementing these tools and techniques without domain expertise and applying them against the knowledge of your organization is an exercise in futility.
I was waiting for this question!! It seems as if AI is something brand new, but in reality, we’ve been looking at it for almost 70 years now, with limited success. It’s true, however, that AI has resurfaced strongly with the rise of Machine Learning, and it is widely applied to many of our day-to-day activities thanks to applications in areas of perception (i.e., voice recognition – “Alexa, buy me a dollhouse!”) and cognition across many industries. AI is nothing new to security, either. Think about SPAM classifiers. How often do you check your spam inbox these days? Chances are you don’t even look at it, although there was a day when this was a big nuisance. The use of Machine Learning techniques like Naïve Bayes classifiers have proven reasonably effective at this job, to the point where we don’t even think about it.
Having said all that, there are fundamental limitations that prevent AI and ML from overcoming the challenges faced by the security industry on its own, and this is why we don’t yet see many practical applications of these techniques in the SOC other than some algorithms used in certain products that are meant to complement the analyst’s job. Some of them are mixed with other not-so-advanced analytics and deterministic methods, and they’re all re-packaged and sold as a brand-new AI/ML product.
I still believe this is the area where we’ll see more advances in the next few years, and there’s no doubt that security will need more data scientists and practitioners working together to keep up with the new evolution of attacks. After all, attackers also have access to AI and ML, and they won’t hesitate to use them against us. This is a field that we call “adversarial Machine Learning.”