Quite often it is more important for you to see the whole picture, rather than to know each case when a particular probe fires. For instance, observing a process which consumes a lot of CPU time according to prstat, we ask ourselves: what is this process doing? And often we think about the overall number of system calls executed by this process, and not every particular one of these system calls.
DTrace aggregations are created specifically for answering such questions. Working with sets of numerical data, aggregation functions help us to easily determing average, min and max values of data sets, count numbers and sums of elements in these data sets, and even build frequency distributions.
More information can be found here, but today I’ll show only few examples of using aggregations.
DTrace stores the results of aggregation functions in objects called aggregations.
The syntax is:
@name[ keys ] = aggfunc ( args );
here’s what this line means:
- name – name of our aggregation
- keys – index keys, very similar to those of associative arrays.
- aggfunc – aggregation function
- args – function arguments, usually it’s a set of numerical data
Today I’ll use sum aggregation function, which as it’s obvious from the name, helps us calculate the sum of a numerical data set,
Only the current value of the aggregation function is stored in the memory, it’s calculated off the previous function argument values. Then the function is calculated for the current value of the argument, and this result is applied to the current aggregation value.
For basic analysis you can use as few aggregations keys as you can – for instance, use only PID, without specifying thread ID as another index key.You can even use execname – but keep in mind that there might be more than one process running on your system with the same name. Still, execname-based aggregation will be quite useful to track what’s getting most of our system’s attention:
Here’s our script:
#!/usr/sbin/dtrace -s #pragma D option quiet syscall:::entry { self->ts=timestamp; } syscall:::return { self->elapsed = timestamp - self->ts; @time[execname]=sum(self->elapsed); } END { printf("n%20s %20sn","Execname", "Time elapsed"); printa("%20s %20@un", @time); }
Save it into a file, do chmod a+x on this file and start it. You’ll get an output similar to this:
Execname Time elapsed xscreensaver 5710 gnome-smproxy 6886 wnck-applet 7735 gnome-panel 52032 metacity 54430 gnome-settings-d 60419 vim 66452 mixer_applet2 670078802 gnome-netstatus- 819836799 nautilus 900025830 Xorg 1074783431 gnome-terminal 1074816530 firefox-bin 2000076871 dtrace 9843796821648
The time in the last column is given in nanoseconds, so don’t be scared with such huge numbers 🙂 For timing some actions in your script timestamp proves to be VERY useful. It’s a nanoseconds counter which increments from an arbitrary point, so it can be used for relative computations only.
Now you can look at the output and notice how much of unnecessary actions is happening on my laptop with started Gnome 😉
If you want a deeper analysis, you can make our script a bit more complex, and you’ll get a powerful tool for determining what process was doing what on our system, and for how long.
#!/usr/sbin/dtrace -s #pragma D option quiet syscall:::entry { self->ts=timestamp; } syscall:::return { self->elapsed = timestamp - self->ts; @time[execname,probefunc,pid,tid]=sum(self->elapsed); } END { printf("n%20s %20s %8s %7s %20sn", "Execname", "Syscall", "Process", "Thread", "Time elapsed"); printa("%20s %20s %8d %7d %20@un", @time); }
If you save this code into syscalls.d and start the script, you’ll probably get something similar (I’ve deleted lots of lines to save up some space):
Execname Syscall Process Thread Time elapsed gnome-terminal read 7199 1 12867 Xorg read 6953 1 13769 dtrace brk 7331 1 33647 dtrace schedctl 7331 1 39320 gnome-terminal write 7199 1 48052 Xorg writev 6953 1 57112 gnome-terminal pollsys 7199 1 646816 dtrace lwp_park 7331 1 44817217 Xorg pollsys 6953 1 46936244 dtrace ioctl 7331 1 9269224066655
If you leave such a script running for a couple of minutes, you’ll get a very long output with the list of every process running on your system all the system calls it’s been executing, with the time spent by each thread of each process.
[…] tips on using global and non-global zones in Solaris, to managing ZFS filesystems, to basic and advanced uses of […]