Tuning the Logstash multiline filter

I’m currently working on a project were we are using Apache Mesos and Marathon as our platform to run and provision our applications. We use the ELK stack to receive, store and analyze our logs. On each of the nodes in the cluster we have Filebeat (the successor of Logstash) installed which sends the logs to Logstash.

At first everything looked fine; all our tasks log showed up in Kibana. However as we added more and more applications to the cluster we started seeing a problem were log from one task would show up on the same event as another task in Kibana.

At first we suspected problems with how we had configured the Filebeat prospector path which contained multiple wildcards:

/tmp/mesos/slaves/*/frameworks/*/executors/*/runs/latest/stdout

The source field contained multiple paths, indicating that this message came from two different tasks. A log message should never have more than one path, so how could this be?

It turned out that the problem was being caused by the multiline filter we were using to collapse multiline message into one Logstash event.

Looking through the documentation of the multiline filter we found the stream_identity setting which determines which stream an event belongs to. By default the stream identity is configured to be %{host}.%{path}.%{type}, but for some reason this was not enough in our case. After appending %{source} to the stream identity the problem disappeared. This is what we added to our multiline filter:

 stream_identity => "%{host}.%{path}.%{type}.%{source}"

With that fixed we now noticed a different problem. Some messages were sometimes being tagged with  _grokparsefailure, indicating that the log parsing failed. Looking closer this happened when the log message contained long output like the stack traces from Java exceptions. The start of a message would be parsed correctly with timestamp and everything, but the message would be cut off. The following message would not contain a timestamp and would continue where the previous stopped.

This seemed strange as the job of the multiline filter is exactly this. It should collapse message into one message as long as it does not match the specified pattern, in our case wait until it matches a timestamp indicating a new message.

More reading of the documentation revealed that there is a periodic_flush setting which by default is true. That is, the multiline filter will periodically flush based on the max_age setting which by default is 5 seconds. Our theory is that due to processing delay and load on the ELK server we hit this timeout causing our messages to split.

By increasing the max_age to 60 seconds the problem seems to have disappeared.

max_age => 60

Being a time based setting this isn’t boom proof as the results will strongly vary based on the load of the server, but for now it seems to do the trick. Another option would be to disable periodic flush, but this means potentially needing to wait very long for messages from tasks that don’t log that often.

The problems mentioned in this post are some of the things being adressed by the new multiline-codec which aims to be the successor of the now deprecated multiline filter. Unfortunately changing from the filter to to the codec is not trivial in all cases, but I’ll leave that for another post.

C++ to Java: Value semantics

About a year ago I changed jobs. I went from writing modern C++11 on an embedded Linux platform to work as a consultant where to primary language is Java. I though it might be interesting to write about how this transitions has been, the differences between C++ and Java and what I miss and don’t miss.

I plan to make it a series titled “C++ to Java” and will assume the reader to have some knowledge about both C++ and Java.

In C++, when doing an assignment or calling a function, values are copied by default. This makes it possible to reason about parameters as data and not the specific instance of an object which holds the data. This is known as value semantics.

In Java however, when excluding primitive types like int, char, etc., there are no value types. Every variable or parameter is a reference to object which means we need to be aware of which instance of an object we are working on all the time.

A couple of months before I started in my job I spent some time writing a toy game engine in Java. Part of the code dealt with updating the position of the mouse cursor while moving it around. I had a mouse object which was updated with information like its position, if it were being pressed, dragged etc.

In the code for the game logic I was using this mouse object to implement something like “if mouse was pressed, move to the position were it was pressed”. While testing my code I clicked around on the screen and my objects moved as expected. However, I suddenly realized that my objects were following my cursor regardless of me clicking.

At first I could not understand what was wrong. The code had a clear if statement that only would update the position of the objects if the mouse was being pressed. How could this be? Well, it turns I though I lived in the C++ world where objects are passed as values. Instead of passing the actual position of the mouse as data, I was passing along a reference to the data which was updated each game tick. First lesson in Java: create new instances!

Lets look at some code examples.

Below is some C++ code that shows that when doing an assignment the data is copied by default:

struct Position
{
    int x;
    int y;
};

int main()
{
    Position p1{10, 20};
    Position p2 = p1;
    p2.x = 0;

    // p1: (x=10, y=20)
    // p2: (x= 0, y=20)
}

The Java code is very similar, but acts differently.

public class Position
{
    public int x;
    public int y;

    public Position(int x, int y) {
        this.x = x;
        this.y = y;
    }

    public static void main(String[] args) {
        Position p1 = new Position(10, 20);
        Position p2 = p1;
        p2.x = 0;

        // p1: (x=0, y=20)
        // p2: (x=0, y=20)
    }
};

As shown, in C++, we have an actual copy of the data, we don’t modify p1 by modifying p2 like we do in Java. In C++ references must be declared explicitly using &. The following code is equivalent to the Java code:

int main()
{
    Position p1{10, 20};
    Position& p2 = p1;
    p2.x = 0;

    // p1: (x=0, y=20)
    // p2: (x=0, y=20)
}

This also applies when calling functions. In C++, when taking an object as an argument to a function, the whole object is copied onto the stack. This is called “pass by value”, and the object that is passed in to the function will remain unchanged. Basically the same thing happens here, only that the objects default constructor is called instead of the default assignment operator.

void modify(Position p) {
    // p is a copy and only exist in this stack frame
}

In Java there is a subtle difference in how this works compared to assignmenet of variables. A classic example is the following code:

public void foo(Position p) {
    p = new Position(5,5);
}

Position p1 = new Position(0,0);
foo(p1);
// what is p.x and p.y?

It turns out that p1 is (0,0) which might surprise someone as “in Java everything is references”, but this is not entirely true. In Java references are passed as values which makes p in foo a copy of the reference to p1 that only exists during the stack frame of the function.

For a more in-depth discussion on this topic, see http://javadude.com/articles/passbyvalue.htm.