Alter Splunk Data at Indexing Time

Splunk is a great software tooling for data analysis and various monitoring activities. It comes in handy with its own web interface which has almost all the settings to configure a Splunk environment. In my story, I used to ingest application logs into Splunk via a log4j connector. Ingesting application logs with detailed events, it is easier to analyze application behavior ( realtime and past data both ) and also to identify application defects during different stages of development. Also in the big show, when your application is Live you need to observe the application statistics while maintaining data security. If we are writing any sensitive information to application logs, we MUST mask them and store them to be on the safe side. I'll be talking about log masking before the log writes operation into Splunk index.

What happened after data altered ???

Simply your data becomes something else,

before altering

<username>chandika</username>

after altering

<username>******</username>

Where should I do the magic ???

You cannot do this using Splunk web interface, you need to change some configurations, the files you need to change are props.conf, transforms.conf

But I’m going to use the props.conf only. Also, you will find more than one props.conf file in Splunk.

Ex:-

$SPLUNK_HOME/etc/system/default/props.conf

$SPLUNK_HOME/etc/system/local/props.conf

$SPLUNK_HOME/etc/apps/search/local/props.conf

$SPLUNK_HOME/etc/apps/search/default/props.conf

Splunk is advice us to not to change props.conf in a default folder. Do the changes in the local folder. If the local folder does not exist, make a local folder using a copy of the default folder. Also, the props.conf in $SPLUNK_HOME/etc/apps/search contains configurations for search app only, if you need to do your change for the entire Splunk, you have to use props.conf in $SPLUNK_HOME/etc/system.

Where should I do the change in Props.conf ???

open the props.conf file, you’ll see some lines like [log4j], [log4php], [catalina]. Those are source types. A source type determines how Splunk Enterprise formats the data during the indexing process. You can limit your changes to only one source type. For example, if you are going to alter data in log4j source type, you’ll need to add your commands under [log4j]. But if you need to do your magic to all the source types, you don’t need to add your commands to each source type. You can add those common commands under [default]. You’ll find it at the top of the props.conf .

How I change data??? What are the commands ???

There are two commands you can use to alter data.

  1. SEDCMD

SEDCMD is work like SED command in Linux. But there are no such features in SEDCMD in Splunk like Linux does. SEDCMD is a faster way to alter some data. But you’ll have to restart the Splunk to apply the new sed commands.

2. EVAL

Eval is a great command for process data while searching. There are multiple functions in EVAL command such as replace, if, split, searchmatch, like. Also, you don’t need to restart Splunk to apply the new eval commands.

Also in Splunk event, _raw is the field that contains the full raw string. You will need it to alter the whole event.

SEDCMD

Actually, SEDCMD is using to extract a new field, but also you can alter the existing fields using it.

You’ll need some regex knowledge to do this. Sed command works like below.

s/regexp/replacement/flags/

There are no many flags available for sed command in Splunk, but you can use /g flag to replace all occurrences.

For example, you’ll need to mask emails. You can do it as below.

SEDCMD-_raw = s/([a-zA-Z0-9][a-zA-Z0-9\-\+_\.]*@[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,})/##email##/g

If you need to mask the email of events that contains “USER_DATA”, you can do as follows.

SEDCMD-_raw = s/(USER_DATA.*)([a-zA-Z0-9][a-zA-Z0-9\-\+_\.]*@[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,})/\1##email##/g

Here \1 represents the first regex group. You can also change the positions of groups like above. But you can only replace one email only. You can’t replace all the emails in one event like above. If you want to do so I recommend you to use EVAL.

If you want to add multiple sed commands, you can add multiple lines.

SEDCMD-_raw = s/<email>[^<]*<\/email>/<email>****<\/email>/g
SEDCMD-_raw = s/<username>[^<]*<\/username>/<username>****<\/username>/g

EVAL

Eval is an easier way to do a lot of works. But you need to know the eval functions. You can alter an event like below.

EVAL-_raw = replace(_raw, "([a-zA-Z0-9][a-zA-Z0-9\-\+_\.]*@[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,})", "#email1#")

Note: You can only put one line to change one field. For example, you may need to alter the email address and username XML tags. But you can’t do it as below.

EVAL-_raw = replace(_raw, "<email>[^<]*<\/email>", "<email>****<\/email>")
EVAL-_raw = replace(_raw,"<username>[^<]*<\/username>", "<username>@@@@<\/username>")

You can do this if you are using sed command. But eval allows one line for one field. Here you are going to change the same field “_raw”.

So you’ll need to put your all conditions and replacements into a single line. I suggest you create a simple script to do the job in python or any language. You can do the above task as follows.

EVAL-_raw = replace(replace(_raw,"<username>[^<]*<\/username>", "<username>@@@@<\/username>"), "<email>[^<]*<\/email>", "<email>****<\/email>")

Also, you can use some conditions with eval.

EVAL-_raw = if(searchmatch("USER_DATA"),replace(_raw, "([a-zA-Z0-9][a-zA-Z0-9\-\+_\.]*@[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,})", "#email1#"), _raw)

Here email only replaces only if the event contains “USER_DATA”. This can replace all the emails in the event.

Apply the changes

If you used the SEDCMD, you’ll have to restart Splunk. You can do it as below.

  1. In web
  • Goto Splunk web
  • Go to Settings > Server Controls
  • Select “Restart Splunk”

2. In terminal

$SPLUNK_HOME/bin/splunk restart

$SPLUNK_HOME usually located at “/opt/splunk”.

If you used EVAL command, you don’t need to restart. Restart also does the job. But it is unnecessary.

You can refresh Splunk by using the below URL.

http://<splunkserver>:8000/en-US/debug/refresh

But you’ll need an admin account.

Also, I should mention my gratitude to Uthpala Pahalavithana for the guidance and support and for introducing me to the Medium.

--

--

--

love programming, automate things, explore technical stuff and study Buddhism

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Scaling Search by Sharding at Blibli

Memory Foam Mattress GettingTips https://t.co/TCTllYrQ7T

Everything about Distributed Message Queue

Flights Search Application with Neo4j — Dockerizing (Part 1)

Digitization of Banking agent path (Digital Transformation)

🦪 Blending Instrument: “SxValidator Technology”

Part 1: A Serverless Application with Stream Data on TAP

Programing Glossaries

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Chandika Udaya Kumara

Chandika Udaya Kumara

love programming, automate things, explore technical stuff and study Buddhism

More from Medium

Digital Transformations in Finance and Banking

Mini Virtual Lab with OSSIM & OSSEC

Google Cloud Monitoring: What You Need to Monitor and Why

Whats new in AWS, news from the week Week #21 of year 2022