There have been buckets of digital ink spilled addressing the topic of threat hunting. The challenge is getting a unified definition of this evolving security discipline. The analysts have chimed in, such as Anton Chuvakin of Gartner among others, and there is no shortage of vendors claiming everything from advanced workflows to artificial intelligence.
Let us take a step back and establish the principles of good threat hunting – putting aside the marketing spin for the latest hunting app.
Threat hunting is primarily a discipline performed by humans, and it begins with a state of mind. The assumption of compromise is essential to any threat hunting operation as it recognizes that network invasions are already underway. The goal of any hunting program is to find and eliminate the present danger, not to scan the vulnerability points for hypothetical [future] concern.
Good hunting operations recognize that the human brain is the best analytic information processor in all of creation. Hunting is a human activity, and we need a method for humans to do their best work. I spoke about this before in our 2017 Outlook.
There are several techniques and methods you can quickly implement to advance a threat hunting program. Let’s examine them in turn.
Though this may seem obvious, it is the leading impediment to world-class hunting – not because of a lack of data, but the assembly of the data. Often, hunters are awash with data with little guidance on how to assemble it all in meaningful ways. Good hunting begins with assembling data so that it can be adequately interrogated.
Assembling data is like learning a language. It is a discrete and combinatorial exercise where open-ended results can be achieved. If you understand, “This is the cat that ate the mouse,” then there is nothing preventing you from understanding, “This is the mouse that ate the cheese.” By following a grammar – a code for assembly – the combinations are endless.
Once we have assembled the data, we can move to seeing its connections.
This is a simple statistical technique of taking the assembled data and applying attributive details to each data element – data enrichment. This gives raw data the opportunity to cluster to other data elements with similar attributes. For example, an IP address as a standalone data element does not reveal much. However, once the data is enriched with greater attributes, it can have “n” number of combinatorial connections. It is through clustering that seemingly benign events become indicators of compromise (IOC).
Often, organizations use machine learning to handle the rapid enrichment needed for large datasets. We can allow computers to do the work of assembling and attributing our data, freeing up the time for hunters to, well…hunt.
Hunting teams must cross-reference what is presently happening internally (as revealed by the assembled data) to what is known externally. Using external sources such as Palo Alto Autofocus, VirusTotal and Symantec DeepSight, hunters can take their unknown and discover what is known about their own IOCs.
As an example, a hunter may notice a sign of persistence with a given attack associated with a URL (now that everything has been clustered). The hunter can then look up the URL to determine what else is known about it, then pivot back to their own environment to locate the user, machine or service presently connected to the compromised site.
Methodologically speaking, hunters can find much more when investigating data with open-ended questions instead of base queries. Why is that?
Consider how anyone searches for information on the Internet for training their new puppy. What if a user began by opening a browser and wrote if/then statements filled with filters to find the pages of greatest relevance to dog-training? That would require a full understanding of the structure of each potential page, its content and countless other variables. In short, our search would be infeasible. If the user did find something, she could be missing critical information that didn’t match the query.
Now, consider what would happen if our intrepid user searched using Google. She simply types away a few phrases and Google uncannily delivers the pages with the greatest relevance. This is open-ended search. Our user does not need to know what the destination will be in advance, its content or structure. Google uses a tokenizing technique to cluster what’s most relevant (even when only the gist is similar) to give the user boundless results, ranked by likelihood.
Let’s apply this same method to security. Hunters need the opportunity to have an open-ended dialogue with data – assembled, clustered and externally informed data. Once we have this in place, we are on our way to hunting excellence.
Getting to Carnegie Hall
There’s a clever quip that goes, “How do you get to Carnegie Hall?” The answer: Practice, practice, practice. I have outlined the principles of threat hunting program, and I have stated unapologetically how the human brain is our best resource for achieving the results we crave. Now, it’s time to put it into practice.
Security professionals have tremendous domain expertise. You should see now that we are climbing the escalator of hunting techniques to unleash that expertise.
- For the most part, you do not need more data – assemble it effectively
- Cluster that data through tokens and attribution – metadata enrichment
- Use external sources – an assumption of compromise requires humility
- Open-ended search – the only way to find the unknown
Customers of FireMon’s Immediate Insight have grown their competency in all areas of threat hunting through automated data assembly, machine learning attribution and natural language extraction, integrating external sources and giving hunting teams the platform for open-ended search. Join us for our webinar Data Orchestration for Incident Response Thursday, March 30 at 2:00p CDT. Register >>