In order to deliver predictive threat protection to our customers, the Umbrella Security Labs research team has to collect and correlate data from various sources in innovative ways. We’ve shared in previous posts how our team applies proprietary algorithms to data from the OpenDNS Global Network, but we’re constantly on the hunt for easy-to-use data platforms that allow for real-time and interactive data visibility.
That’s why we wanted to share a bit about our experience with Splunk, a big data management system that provides fast machine data parsing, indexing, searching and data analyses. The GUI interface, dashboard and availability of security-related add-ons make for a neat out-of-the-box solution for enhanced data visibility.
Splunk Basic Usage
Installation of Splunk base is rather straightforward. Check out their official docs for installation instructions. When you’re getting started, these are some of the basic ways to use Splunk: add data to splunk (data input), search, delete, data aggregation, data transformation, and charting.
If you’re using customized data, you’ll likely find input to be the trickiest part. That’s where Splunk will have to figure out the correct data format, and properly parse it to extract fields. Splunk tries to automatically break the raw blob of textual input into EVENTS based on default or customized event breaking settings, and recognize the timestamp for each event. These settings can be customized both via Splunk GUI or command line interface (CLI). Make changes props.conf file to tell Splunk how to treat your data with correct configurations.
An example of extracting tab delimited fields from my input data:
For data queries and other operations (aggregating, data transforming etc.), Splunk’s pipe syntax seems pretty straightforward.
The following query that maps out a number of IP addresses that fits certain criteria serve as a good example of basic query syntaxes. The example requires the geoIP mapping app provided by Maxmind, and amMap, a mapping app.
sourcetype=mute* | rex "(?d+.d+.d+.d+)"| search ip!=192.168* ip!=0.0.* ip!=10.*| stats count by ip | eval count_label="Event" | eval iterator="ip" | eval iterator_label="IP" | eval zoom = "zoom="334%" zoom_x="-128.58%" zoom_y="-113.11%""| eval movie_color="#FF0000" | eval output_file="home_threat_data.xml" | eval app="amMap" | lookup geoip clientip as ip | search client_country!=^$ | mapit
Splunk data forwarding and receiving
Install the universal forwarder if your have remote data. The universal forwarder gathers data from servers where your input data reside and forwards them to your main Splunk server for indexing and searching.
./splunk add forward-server [splunk server:port]
/opt/splunkforwarder/bin/splunk add monitor /path/to/app/logs/ -index main -sourcetype %app%
At the same time, enable receiver – the main Splunk server and indexer by going to Splunk GUI, in forwarding and receiving->add new -> TCP port [port]
To troubleshoot the deployment, check these internal logs at the receiving indexer:
$SPLUNK_HOME/var/log/splunk/splunkd.log
$SPLUNK_HOME/var/log/splunk/license_audit.log
Use cases for Splunk security apps
Splunk base has a set of charting choices. In the following example, we made a pie chart of user agent distribution of our mobile clients data.
Snort app has been a great tool for quick network threat monitoring and alerting. We can easily retrieve all the entries that triggered snort, and perform in-depth investigations given the source IP addresses and contextual network data.
Snort and amMap makes use of Maxmind’s geo-ip mapping to give us an instant global look at the threat’s scale and spreading patterns.
Conclusion
We have yet to explore Splunk’s other interesting capabilities, such as real-time correlation making and alerting, or its distributed system deployment scheme (with Hadoop integration). We’ve spent lots of time with Hadoop and Hbase, which are largely back-end systems. As far as our primitive use of Splunk goes, it seems to serve quite well as a front-end portal for internal search, query and reporting. Data parsing for customized data is not as intuitive. It would be great if it provided pipe-like syntax for data input, as well.