
Alter Splunk Data at Indexing Time
Splunk is a great software tooling for data analysis and various monitoring activities. It comes in handy with its own web interface which has almost all the settings to configure a Splunk environment. In my story, I used to ingest application logs into Splunk via a log4j connector. Ingesting application logs with detailed events, it is easier to analyze application behavior ( realtime and past data both ) and also to identify application defects during different stages of development. Also in the big show, when your application is Live you need to observe the application statistics while maintaining data security. If we are writing any sensitive information to application logs, we MUST mask them and store them to be on the safe side. I'll be talking about log masking before the log writes operation into Splunk index.
What happened after data altered ???
Simply your data becomes something else,
before altering
<username>chandika</username>
after altering
<username>******</username>
Where should I do the magic ???
You cannot do this using Splunk web interface, you need to change some configurations, the files you need to change are props.conf, transforms.conf
But I’m going to use the props.conf only. Also, you will find more than one props.conf file in Splunk.
Ex:-
$SPLUNK_HOME/etc/system/default/props.conf
$SPLUNK_HOME/etc/system/local/props.conf
$SPLUNK_HOME/etc/apps/search/local/props.conf
$SPLUNK_HOME/etc/apps/search/default/props.conf
Splunk is advice us to not to change props.conf in a default folder. Do the changes in the local folder. If the local folder does not exist, make a local folder using a copy of the default folder. Also, the props.conf in $SPLUNK_HOME/etc/apps/search contains configurations for search app only, if you need to do your change for the entire Splunk, you have to use props.conf in $SPLUNK_HOME/etc/system.
Where should I do the change in Props.conf ???
open the props.conf file, you’ll see some lines like [log4j], [log4php], [catalina]. Those are source types. A source type determines how Splunk Enterprise formats the data during the indexing process. You can limit your changes to only one source type. For example, if you are going to alter data in log4j source type, you’ll need to add your commands under [log4j]. But if you need to do your magic to all the source types, you don’t need to add your commands to each source type. You can add those common commands under [default]. You’ll find it at the top of the props.conf .
How I change data??? What are the commands ???
There are two commands you can use to alter data.
- SEDCMD
SEDCMD is work like SED command in Linux. But there are no such features in SEDCMD in Splunk like Linux does. SEDCMD is a faster way to alter some data. But you’ll have to restart the Splunk to apply the new sed commands.
2. EVAL
Eval is a great command for process data while searching. There are multiple functions in EVAL command such as replace, if, split, searchmatch, like. Also, you don’t need to restart Splunk to apply the new eval commands.
Also in Splunk event, _raw is the field that contains the full raw string. You will need it to alter the whole event.
SEDCMD
Actually, SEDCMD is using to extract a new field, but also you can alter the existing fields using it.
You’ll need some regex knowledge to do this. Sed command works like below.
s/regexp/replacement/flags/
There are no many flags available for sed command in Splunk, but you can use /g flag to replace all occurrences.
For example, you’ll need to mask emails. You can do it as below.
SEDCMD-_raw = s/([a-zA-Z0-9][a-zA-Z0-9\-\+_\.]*@[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,})/##email##/g
If you need to mask the email of events that contains “USER_DATA”, you can do as follows.
SEDCMD-_raw = s/(USER_DATA.*)([a-zA-Z0-9][a-zA-Z0-9\-\+_\.]*@[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,})/\1##email##/g
Here \1 represents the first regex group. You can also change the positions of groups like above. But you can only replace one email only. You can’t replace all the emails in one event like above. If you want to do so I recommend you to use EVAL.
If you want to add multiple sed commands, you can add multiple lines.
SEDCMD-_raw = s/<email>[^<]*<\/email>/<email>****<\/email>/g
SEDCMD-_raw = s/<username>[^<]*<\/username>/<username>****<\/username>/g
EVAL
Eval is an easier way to do a lot of works. But you need to know the eval functions. You can alter an event like below.
EVAL-_raw = replace(_raw, "([a-zA-Z0-9][a-zA-Z0-9\-\+_\.]*@[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,})", "#email1#")
Note: You can only put one line to change one field. For example, you may need to alter the email address and username XML tags. But you can’t do it as below.
EVAL-_raw = replace(_raw, "<email>[^<]*<\/email>", "<email>****<\/email>")
EVAL-_raw = replace(_raw,"<username>[^<]*<\/username>", "<username>@@@@<\/username>")
You can do this if you are using sed command. But eval allows one line for one field. Here you are going to change the same field “_raw”.
So you’ll need to put your all conditions and replacements into a single line. I suggest you create a simple script to do the job in python or any language. You can do the above task as follows.
EVAL-_raw = replace(replace(_raw,"<username>[^<]*<\/username>", "<username>@@@@<\/username>"), "<email>[^<]*<\/email>", "<email>****<\/email>")
Also, you can use some conditions with eval.
EVAL-_raw = if(searchmatch("USER_DATA"),replace(_raw, "([a-zA-Z0-9][a-zA-Z0-9\-\+_\.]*@[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,})", "#email1#"), _raw)
Here email only replaces only if the event contains “USER_DATA”. This can replace all the emails in the event.
Apply the changes
If you used the SEDCMD, you’ll have to restart Splunk. You can do it as below.
- In web
- Goto Splunk web
- Go to Settings > Server Controls
- Select “Restart Splunk”
2. In terminal
$SPLUNK_HOME/bin/splunk restart
$SPLUNK_HOME usually located at “/opt/splunk”.
If you used EVAL command, you don’t need to restart. Restart also does the job. But it is unnecessary.
You can refresh Splunk by using the below URL.
http://<splunkserver>:8000/en-US/debug/refresh
But you’ll need an admin account.
Also, I should mention my gratitude to Uthpala Pahalavithana for the guidance and support and for introducing me to the Medium.