Orchestrating Data Assembly for Security Analysis

As the need for security increases so does our need for data; however, getting from raw silo-ed data to something actionable is too difficult. I believe the industry discusses the outcomes of data analysis a lot, but doesn’t spend enough time discussing why it’s so difficult. The following is a short exploration of why improving and accelerating data analysis is so vital and some new approaches I believe can relieve the strain on security teams.

The volume of security alerts far exceeds the security team’s capacity to assess if they are indicators of true security incidents or false positives. New infrastructure paradigms, such as cloud/mobile-centric architectures and dynamic-by-design infrastructures (e.g. SDN), will further strain organizations’ data analysis capacity. Combine this with a more sophisticated, determined adversary and an avalanche of data, and it’s clear that threat detection and response demands will continue to exceed the organization’s analysis and response capacity.

I don’t think there’s any controversy about the increasing challenge of detection and response. A recent report from Gartner forecasted that investment in detection and response technologies will surpass prevention as soon as 2020.

Typically, the analysis into whether an alert is a real incident typically starts with an effort to prepare data (aka data wrangling or data assembly). This is commonly a difficult and time-consuming process with industry studies reporting data scientists spend around 80% of their time preparing and managing data for analysis. Requiring so much effort to assemble the data for analysis, it’s not surprising short-staffed teams aren’t able to investigate the majority of security alerts.

Immediate Insight’s architecture and process for data ingestion can significantly accelerate the data assembly process and offers a new level of integration and analysis flexibility. Its natural language data ingestion, entity extraction and enrichment does not require parsing so advanced awareness of data structure is not required. This enables the system and process to include structured, semi-structured, and unstructured data.

By combining its parsing-free natural-language data ingestion with an open API, Active Collectors, and Get Insight features, the system can easily be integrated into security operations processes and tool workflows. The following are some specific examples for Palo Alto Networks security systems.

“Active Collectors” allow you to call external 3rd party services. They support most Linux scripting languages, including BASH and Python, which enables support for most APIs and web services, like Palo Alto Network’s AutoFocus. They can be invoked both automatically, based on conditions (e.g. “every time you see a critical alert, automatically send a lookup query to Palo Alto AutoFocus and then save the result so investigators have it at the ready”), and also on-demand so that an investigator can manually request data that might be too contextual or too expensive to continuously store.

The “Get Insight” feature allows you to connect a lookup service or action, configured as an Active Collector. Once the service call has been configured, an investigator can trigger the action simply by clicking the (i) button next to an entity or on selecting “Get Insight” from one-click analytics menu displayed upon mouse-over of the target event. These buttons take the context of the selected event and prompt for additional required information and then invokes an action like adding an address or URL to a Palo Alto Network firewall’s dynamic block list to quarantine a host or web site.

The process described for both the Active Collector and Get Insight examples is manual, but can be automated as well.

When combined with API-enabled security solutions like those from Palo Alto Networks, Immediate Insight’s flexible natural language data ingest can significantly streamline data analysis, enabling teams to scale their investigation efforts.