Erlang

Erlang trace files in Wireshark

by Magnus Henoch

If you’ve tried to debug a running Erlang system, you probably tried tracing. The Erlang VM lets you trace many different things, most notably messages and function calls, but also processes getting spawned and killed, and garbage collections.

Perhaps the most common way to use tracing is calling the dbg module from an Erlang shell, tracing calls for a few functions:

%% Start a tracer process
dbg:tracer().
%% Trace function calls for all processes
dbg:p(all, [call, timestamp]).
%% Add a function to trace calls for
dbg:tp(my_module, my_function, cx).

That prints any function calls for that specific function in the shell.

But what if you have a lot of trace events? Digging through shell output isn’t the most pleasant way to find a needle in the haystack.

The dbg module includes support for writing trace events to files and reading them back. To write trace events to a file, use the following command:

dbg:tracer(port, dbg:trace_port(file, "foo.trace")).

Then you can read back the data with dbg:trace_client.

But since the trace file is just a data file, with a fairly simple format, we could just use a tool that specialises in inspecting files full of timestamped events: Wireshark!

Wireshark? Really?

Yes! Wireshark is mostly known for reading files in the PCAP and pcapng formats, but you can add support for other file formats through Lua plugins.

It so happens that I have a plugin that does just that, called cautious-rotary-phone. (Github’s repository name generator sometimes helps imagination-deficient programmers.)

It consists of two modules, erlang-trace.lua and erlterm.lua.

erlang-trace.lua reads the files in the Erlang trace file format. It’s a simple format, with each entry consisting of three elements:

  • one byte, always 0, meaning the start of a binary entry
  • a 32-bit big-endian integer, denoting the size in bytes of the entry
  • the entry itself, in Erlang external term format

The module registers itself as a file handler, and when called, reads the file, chops it up into entries, and extracts the timestamp for each entry. That gives Wireshark enough data to display all the trace entries in its packet list.

Then erlterm.lua gets invoked. It is a protocol dissector, similar to ones you might write for any network protocol, except it dissects Erlang terms instead of network packets. It identifies different types of terms, makes the data available to Wireshark for display and filtering, and dives into recursive data types such as tuples and lists.

So what does it look like?

Let’s try this out on MongooseIM, the instant messaging server! I installed it on my machine, and set out to find something interesting to trace. It so happens that client-to-server (“c2s”) connections are handled by child processes of a supervisor called ejabberd_c2s_sup. We can trace all messages sent and received by any newly spawned child processes using the set_on_spawn trace flag:

dbg:tracer(port, dbg:trace_port(file, "mim.trace")).
dbg:p(ejabberd_c2s_sup, [timestamp, send, 'receive', set_on_spawn]).

Then I started two XMPP clients, registered two accounts, sent messages between them, and then opened the trace file in Wireshark.

I sent a message with the text hello, so I can find the trace events containing that text with this Wireshark filter:

frame contains "hello"

That shows me this event:

Erlang binary term format: {trace_ts, <mongooseim@localhost.869.0>, receive, 2 more elements}
    Version number: 131
    Term type: Small tuple (104)
    Arity: 5
    Tuple element: trace_ts
    Tuple element: <mongooseim@localhost.869.0>
    Tuple element: receive
    Tuple element: {$gen_event, {xmlstreamelement, {xmlel, <<"message">>, [ 4 elements ], [ 2 elements ]}}}
    Tuple element: {1526, 395003, 982214}

So the process 869.0 on the node called mongooseim@localhost has received an XML element called message. The XML element itself exceeds the display threshold, but I can open up the tree element and see more details:

Tuple element: {xmlel, <<"message">>, [ 4 elements ], [ 2 elements ]}
    Term type: Small tuple (104)
    Arity: 4
    Tuple element: xmlel
    Tuple element: <<"message">>
    Tuple element: [ 4 elements ]
        Term type: List (108)
        List length: 4
        List element: {<<"xmlns">>, <<"jabber:client">>}
        List element: {<<"type">>, <<"chat">>}
        List element: {<<"to">>, <<"bbb@localhost">>}
        List element: {<<"id">>, <<"aac6a">>}
        Tail: []
    Tuple element: [ 2 elements ]
        Term type: List (108)
        List length: 2
        List element: {xmlel, <<"body">>, [], [{xmlcdata, <<"hello!">>}]}
        List element: {xmlel, <<"active">>, [{<<"xmlns">>, <<"http://jabber.org/protocol/chatstates">>}], []}
        Tail: []

Here we can see the attributes of the message element, as well as its body child element containing the text I sent, and an active child element representing a “chat state” - this would affect how the recipient’s client displays this contact.

So far so good - but what if we want to see this trace event in context? We can open the <mongooseim@localhost.869.0> item that represents the process, right-click on the process id and select “Apply as Filter” > “Selected”, and Wireshark displays all events that relate to this process.

For example, we see this message, which was neither sent from nor to the process in question, but contains the process id in the message:

Erlang binary term format: {trace_ts, <mongooseim@localhost.432.0>, send, 3 more elements}
    Version number: 131
    Term type: Small tuple (104)
    Arity: 6
    Tuple element: trace_ts
    Tuple element: <mongooseim@localhost.432.0>
    Tuple element: send
    Tuple element: {#Ref<mongooseim@localhost.157787.1506541571.496806730>, {ok, <mongooseim@localhost.869.0>}}
    Tuple element: <mongooseim@localhost.805.0>
    Tuple element: {1526, 394951, 282018}

So process 432.0 sent a message to 805.0. The message contains a reference, and the id of the process we were looking at earlier.

This is a common Erlang pattern: the calling process generates a unique reference, sends it as part of a message, and expects to get the same reference in the response. So let’s try filtering on that reference, and we should find the request that this is a response to! Indeed:

Erlang binary term format: {trace_ts, <mongooseim@localhost.432.0>, receive, 2 more elements}
    Version number: 131
    Term type: Small tuple (104)
    Arity: 5
    Tuple element: trace_ts
    Tuple element: <mongooseim@localhost.432.0>
    Tuple element: receive
    Tuple element: {$gen_call, {<mongooseim@localhost.805.0>, #Ref<mongooseim@localhost.157787.1506541571.496806730>}, 1 more elements}
        Term type: Small tuple (104)
        Arity: 3
        Tuple element: $gen_call
        Tuple element: {<mongooseim@localhost.805.0>, #Ref<mongooseim@localhost.157787.1506541571.496806730>}
        Tuple element: {start_child, [{ejabberd_socket, 1 more elements}, [ 8 elements ]]}
    Tuple element: {1526, 394951, 281726}

So that call says start_child, and mentions ejabberd_socket - we’ve found the call that led to our process being created.

Conclusion

Wireshark has many features aimed at making inspecting network traffic easy, and Erlang trace events are conceptually similar enough that we can make use of this existing tool.

The Lua modules I’m using can certainly be improved, but they are already powerful enough to correlate requests to responses, and find relations between processes. What other features would you like to see?

Go back to the blog

×

Thank you for your message

We sent you a confirmation email to let you know we received it. One of our colleagues will get in touch shortly.
Have a nice day!