The JyMob Blog

Choose the Best People.

Top and Java Threads

Introduction

top is a great tool to diagnose a *nix system. The focus of this post is on using it for debugging Java threads on Linux.

First, the environment:

1
2
3
4
5
6
7
8
9
10
11
1) uname -srvp:
  Linux 3.13.0-39-generic #66-Ubuntu SMP Tue Oct 28 13:30:27 UTC 2014 x86_64
2) java -version:
  java version "1.8.0_25"
  Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
  Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)
3) cat/etc/lsb-release 
  DISTRIB_ID=Ubuntu
  DISTRIB_RELEASE=14.04
  DISTRIB_CODENAME=trusty
  DISTRIB_DESCRIPTION="Ubuntu 14.04.1 LTS"

A thread is a thread of execution. The way it differs from an operating system process (process, for short) is that it shares the address space of a process with other threads, whereas by default, the address space of a process is ‘protected’ from other processes. If you believe in multithreaded programming model, then the sharing of memory is the key strength of (and major source of problems with) threads.

Observation and measurement are critical. Galileo once said, “Measure what is measurable, and make measurable what is not so.”. The Observer Effect, the changes that the act of observation will make on a phenomenon being observed, can be reduced by better instrumentation and observation techniques.

top is one such good tool (although it takes up resources of its own) for *nix systems. It takes a bit of getting used to, but once you are used to it, it is almost indispensable. Until you hit the right keys however, it displays numbers that don’t make sense. This is compounded by the fact that every option usually has a ‘default value’. The default values are great, but the problem with them is that if default value is not something that pleases me, I hate not just the choice of that default, but the software/tool itself (that’s just human psychology)! But believe me, even though you hate to read man pages, coming up with the default value that pleases every user is very hard.

top is for the entire Linux system. It displays the resources used by each process and by the computer. The slowdown of a computer sometimes can be traced back to a few processes/programs running on it. But initial display of top can be daunting for the uninitiated. Hastily, you run top on the command line (and assuming you have no .toprc in your home folder – and if you do, you can skip to the next section), you’ll see something like this (that changes every three seconds):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Tasks: 261 total,   2 running, 259 sleeping,   0 stopped,   0 zombie
%Cpu(s): 26.6 us,  0.3 sy,  0.0 ni, 73.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:   8139376 total,  5951588 used,  2187788 free,   181948 buffers
KiB Swap:  3227644 total,   218856 used,  3008788 free.   905872 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                             
 7525 kedar     20   0 3462036  24084  10600 S 100.1  0.3   0:34.46 java                                                
 1018 kedar     20   0   86536  33812   3060 S   2.0  0.4  18:29.48 ruby                                                
14857 kedar     20   0  848748 111596  16640 S   1.0  1.4  11:36.79 chrome                                              
14919 kedar     20   0 1515700 378408  34724 S   1.0  4.6  25:18.63 firefox                                             
15054 kedar     20   0  495420  12552   3320 S   1.0  0.2   7:55.82 GoogleTalkPlugi                                     
 1016 kedar     20   0  233896  85508   3980 S   0.7  1.1   6:07.43 ruby                                                
 1020 kedar     20   0  553748  37864   3400 S   0.7  0.5   3:38.07 ruby                                                
 7285 kedar     20   0  983472 284460  40276 S   0.7  3.5   0:21.85 chrome                                              
14880 kedar     20   0  862848 108624  14096 S   0.7  1.3  10:35.25 chrome                                              
15002 kedar     20   0 1028820  74844   7808 S   0.7  0.9   8:57.24 chrome                                              
 1014 kedar     20   0   68032  17496   3004 S   0.3  0.2   3:32.70 ruby                                                
 1299 mysql     20   0  484316  30020   1216 S   0.3  0.4   1:07.24 mysqld     
 ...

There are many references available to decipher the meanings of those lines and you can refer to them by googling around. But remember this: top is highly configurable and most if not all of its controls are (thankfully) toggles! This helps a lot because almost always, you have to press a single key to turn something on, or press the same key to turn it off. This is extremely useful in practice.

So, just press 1 (yeah, the number 1) on the top window display above. And here’s what you see:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
top - 08:00:34 up 22:42,  5 users,  load average: 1.07, 1.06, 0.69
Tasks: 269 total,   3 running, 266 sleeping,   0 stopped,   0 zombie
%Cpu0  : 18.3 us,  0.7 sy,  0.0 ni, 81.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  : 99.0 us,  0.3 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.7 si,  0.0 st
%Cpu2  : 13.7 us,  0.7 sy,  0.0 ni, 85.3 id,  0.3 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  : 44.5 us,  0.7 sy,  0.0 ni, 54.5 id,  0.3 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:   8139376 total,  5941144 used,  2198232 free,   183408 buffers
KiB Swap:  3227644 total,   218716 used,  3008928 free.   960956 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                             
 7525 kedar     20   0 3462036  24084  10600 S 100.1  0.3  11:44.26 java                                                
 7856 root      20   0   30948   1860   1568 R  66.5  0.0   0:02.00 copy                                                
 1018 kedar     20   0   86536  33812   3060 S   2.0  0.4  18:42.62 ruby                                                
 1453 root      20   0  314220 118164  33908 S   1.7  1.5   7:34.79 Xorg                                                
 3035 kedar     20   0 1843440 349756  54168 S   1.0  4.3   8:27.15 compiz                                              
 3920 kedar     20   0  895880  23000   9792 S   1.0  0.3   0:56.78 gnome-terminal                                      
 7836 root      20   0   33924   3172   2304 S   1.0  0.0   0:00.03 apt-get                                             
14857 kedar     20   0  848748 113632  16644 S   1.0  1.4  11:42.27 chrome                                              
 7852 root      20   0   34568   3124   1752 S   0.7  0.0   0:00.02 bzip2                                               
14880 kedar     20   0  862848 106644  14100 S   0.7  1.3  10:40.56 chrome                                              
14919 kedar     20   0 1555700 451912  34724 S   0.7  5.6  25:28.35 firefox                                             
 3088 kedar     20   0 1481860  26324  16220 S   0.3  0.3   0:27.03 nautilus                                            
 4395 kedar     20   0  496472   5440   3596 S   0.3  0.1   0:02.64 update-notifier                                     
 7802 kedar     20   0   26564   1812   1164 R   0.3  0.0   0:00.02 top

Now compare this and that. You see that the statistics for each processor are suddenly appearing! Yeah, it is a quad-core processor! The recent load average is hovering around 1, which means about one of the four processors this box has, is busy! The Cpu1 is busy almost all the time in the so-called ‘userland’ (shown by 99.0us) which is my code. So, my code is busy keeping the processor that runs it busy :-). But which process is that? Ah, I see the first line with PID 7525 to have %CPU as 100.1. Thus, this process is keeping more than 1 processor busy!

I also see that the memory available on this box is about 8 GB and about 2 GB is just free, and about 1 G is in cache, which means plenty of memory is still available. But yes, with processing, we are falling somewhat short.

I then get intrigued by this java process (PID: 7525) and want to analyze it further. So, I quit top (hit ‘q’) and rerun it as: top -p 7525 and then again, press 1, I see a much focused display:

1
2
3
4
5
6
7
8
9
10
11
top - 09:33:04 up 1 day, 14 min,  5 users,  load average: 1.27, 1.19, 1.16
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu0  :100.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  :  6.4 us,  0.7 sy,  0.0 ni, 92.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  :  2.7 us,  0.7 sy,  0.0 ni, 96.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  :  3.4 us,  0.3 sy,  0.0 ni, 96.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:   8139376 total,  6911576 used,  1227800 free,   478440 buffers
KiB Swap:  3227644 total,   215492 used,  3012152 free.  1209696 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                             
 7525 kedar     20   0 3462036  36320  10608 S 100.2  0.4 104:14.49 java

You see that the second line in this display shows that the number of Tasks is 1 total. So, you wonder what ‘Task’ means. Googling around you may come to know that for top, Task is same as process (like sometimes it happens in math, in computers, the terminology can be quite confusing). Then it says that 0 task(s) are running and 1 task(s) are sleeping, out of 1 total. That is definitely confusing. If the task/process was consuming 100% of the CPU, how come top shows that it is sleeping (don’t know if that’s a bug)?

OK, but this doesn’t tell me about the Java threads (yet). The thread that you create as a Java Language Construct (by extending java.lang.Thread or implementing java.lang.Runnable and starting it by calling the start() method) should somehow result in something at the system level – that’s a fair expectation – you tell yourself. And you are right, top helps you ‘see’ that as shown in the next section.

Java and top

Let’s now look at a simple Java program that I compile and run:

(Threads.java) download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
import java.util.*;
import java.io.*;
public class Threads {
  static volatile boolean stop = false;
  static volatile boolean stopMain = false;
  public static void main(String[] args) throws Exception {
    Thread reader = new Thread(new Reader());
    reader.start();
    while (!Threads.stopMain) {
      ;
      //do nothing, or keep sending traffic on interconnect!
    }
  }
}
class Reader implements Runnable {
  private final BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));
  public void run() {
    while (!Threads.stop) {
      String s = null;
      try {
        s = reader.readLine();
      } catch (Exception e) {
        throw new RuntimeException(e);
      }
      if ("n".equals(s)) {
        Thread t = new Thread(new Runner());
        t.start();
        System.out.println("started new thread: " + t.getName());
      } else if ("q".equals(s)) {
        Threads.stop = true;
        Threads.stopMain = true;
      } else if ("m".equals(s)) {
        Threads.stopMain = true;
      } else {
        //ignore ?
        System.out.println("press 'm', 'n' or 'q'");
      }
    }
  }
}
class Runner implements Runnable {
  static final Random r = new Random();
  public void run() {
    while (!Threads.stop) {
      long sum = r.nextInt(1000);
      for (int i = 0; i < 1000000; i++) {
        sum += i;
      }
      /*
      try {
        System.out.println("did looping for a bit, now i'm sleeping ..." + Thread.currentThread().getName());
        System.out.println("sum: " + sum);
        Thread.sleep(2000);
      } catch (InterruptedException e) {
        throw new RuntimeException(e);
      }
      */
    }
  }
}

As you can see, the program creates ‘Runner Threads’ at will. It has a Main Thread and a Reader Thread. The Reader Thread interacts with the stdin and following happens:

  • If you press ‘n’ (meaning “new thread”), then a new thread is started. The new thread in its run method runs the so-called ‘busy loop’ where it just chooses several random numbers and adds them up. It does this forever. The moment this thread is scheduled to be run on a processor, it is expected to keep it busy, one hundred percent!
  • If you press ‘m’ (meaning “main thread”), the main thread, which, in Java parlance, is a non-daemon thread stops gracefully. While it’s not ‘stopped’ the main thread just waits to stop. In other words, it just checks the value of a volatile variable (sending too many reads on what is called the ‘interconnect’) all the time. This too, is expected to keep the processor and the interconnect (the bus that connects the chip and the main memory) very busy.
  • If you press ‘q’ (meaning “quit”), the JVM program/process quits gracefully, stopping all threads properly. There’s only one thread (Reader) that writes to the volatile variable and others just read it (no compound operations like the innocuous-looking but dangerous ++). So, any change to the variable’s value should be read by other threads. In other words, this is a ‘thread-safe’ little program.

I run it as:

1
2
3
java -Xmx1g -server Threads
  or, simply as:
java Threads

Now, how do you find the process ID (pid) of the JVM? Right, by doing: ps -afe | grep java. When I do that, here’s what I see:

1
2
3
➜  jymob_blog git:(master) ✗ ps -afe | grep java
kedar     7525  1132 99 07:48 pts/15   04:56:34 java -Xmx1g Threads
kedar    13607 13016  0 12:45 pts/0    00:00:00 grep java

There’s a reason for describing this in (rather excruciating) detail. The point I am trying to make is, you are looking for the pid of a process. We generally understand that every process has a ‘unique’ ID that identifies it and that ID is pid. But because of something peculiar that Linux does, this becomes complicated. And this is for historic reasons that we won’t go into here, but will come back to it.

OK, now do the same: top -p 7525 which will show you last top display.

Press 1.

Press c (yes, lowercase c).

Press H (yes, uppercase H).

This should show you a display like this (of course, your pid’s, i.e. the first column will be different):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
top - 12:51:41 up 1 day,  3:33,  6 users,  load average: 1.14, 1.19, 1.22
Threads:  16 total,   1 running,  15 sleeping,   0 stopped,   0 zombie
%Cpu0  :  3.4 us,  0.0 sy,  0.0 ni, 96.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  :100.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  :  4.0 us,  0.7 sy,  0.0 ni, 95.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  :  4.1 us,  0.3 sy,  0.0 ni, 95.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:   8139376 total,  7783300 used,   356076 free,   483020 buffers
KiB Swap:  3227644 total,   213308 used,  3014336 free.  1250832 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                              
 7525 kedar     20   0 3462036  36320  10608 S  0.0  0.4   0:00.00 java -Xmx1g Threads
 7527 kedar     20   0 3462036  36320  10608 S  0.0  0.4   0:00.00 java -Xmx1g Threads                                  
 7528 kedar     20   0 3462036  36320  10608 S  0.0  0.4   0:00.00 java -Xmx1g Threads
 7529 kedar     20   0 3462036  36320  10608 S  0.0  0.4   0:00.00 java -Xmx1g Threads                                                
 7530 kedar     20   0 3462036  36320  10608 S  0.0  0.4   0:00.00 java -Xmx1g Threads                                                
 7531 kedar     20   0 3462036  36320  10608 S  0.0  0.4   0:00.31 java -Xmx1g Threads                                                
 7532 kedar     20   0 3462036  36320  10608 S  0.0  0.4   0:00.00 java -Xmx1g Threads                                                
 7533 kedar     20   0 3462036  36320  10608 S  0.0  0.4   0:00.00 java -Xmx1g Threads                                                
 7534 kedar     20   0 3462036  36320  10608 S  0.0  0.4   0:00.00 java -Xmx1g Threads                                                
 7535 kedar     20   0 3462036  36320  10608 S  0.0  0.4   0:00.17 java -Xmx1g Threads                                                
 7536 kedar     20   0 3462036  36320  10608 S  0.0  0.4   0:00.18 java -Xmx1g Threads                                                
 7537 kedar     20   0 3462036  36320  10608 S  0.0  0.4   0:00.17 java -Xmx1g Threads                                                
 7538 kedar     20   0 3462036  36320  10608 S  0.0  0.4   0:00.00 java -Xmx1g Threads                                                
 7539 kedar     20   0 3462036  36320  10608 S  0.0  0.4   0:05.46 java -Xmx1g Threads                                                
 7540 kedar     20   0 3462036  36320  10608 S  0.0  0.4   0:00.00 java -Xmx1g Threads

WOW!

You say. Yeah, I did too! You notice:

  1. The processor Cpu1 is busy doing work as expected (because of ‘1’).
  2. You see the entire java command line (because of ‘c’)
  3. The second line changes from Tasks: to Threads: and it starts showing 16 threads!
  4. The 16 threads are listed with every one of them getting its own pid!

Say what? Each thread getting a unique process ID, aka pid? So, is a process same as a thread? How come each thread gets a pid? And even if it gets, shouldn’t it be the same for the threads running in one single Java VM, because, after all, the JVM or your code started those threads? How can they get different pid’s?

If you now go back to Java program’s command prompt, and press ‘n’, you will see:

1
2
3
4
5
6
7
At the Java Command Prompt:

started new thread: Thread-1

In the top Window, an additional line is added:

 7541 kedar     20   0 3462036  36320  10608 S  99.9  0.4   0:00.00 java -Xmx1g Threads

If you pressed ‘n’ again, a new thread that keeps yet another processor busy is started. Thus, if you pressed ‘n’ twice you will have 3 of the 4 CPU’s busy at 100%. (Don’t assume that a thread is ‘pinned to’ a CPU however. It’s completely up to the System Scheduler to assign processors to threads.) Roughly though, 75% of your entire system is now busy running your little Java threads! In this case, if you pressed ‘m’, the main thread would stop (gracefully), but the JVM leaves on. The reason for that is the non-daemon-ness of threads is inherited and since main thread (which was non-daemon) started off Reader thread, it (Reader) becomes non-daemon making each of the Runner threads non-daemon too (this is all configurable though, I am describing the ‘default behavior’ :-)). Since at least a non-daemon thread is running, JVM process does not go down.

Back at the top window:

Press f (lowercase f).

Then, take the cursor (down arrow) to PPID, press Space, take the cursor to nTH, press Space and take it to TGID and press Space.

Press Escape or q.

You should see something like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Threads:  16 total,   1 running,  15 sleeping,   0 stopped,   0 zombie
%Cpu0  :100.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  :  1.7 us,  0.3 sy,  0.0 ni, 98.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  :  4.4 us,  0.3 sy,  0.0 ni, 95.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  :  2.7 us,  0.0 sy,  0.0 ni, 97.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:   8139376 total,  7125652 used,  1013724 free,   153168 buffers
KiB Swap:  3227644 total,   209264 used,  3018380 free.   689972 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                PPID nTH  TGID 
 7526 kedar     20   0 3462036  34440   8728 R 99.9  0.4 454:12.31 java -Xmx1g Threads    1132  16  7525 
 7525 kedar     20   0 3462036  34440   8728 S  0.0  0.4   0:00.00 java -Xmx1g Threads    1132  16  7525 
 7527 kedar     20   0 3462036  34440   8728 S  0.0  0.4   0:00.00 java -Xmx1g Threads    1132  16  7525 
 7528 kedar     20   0 3462036  34440   8728 S  0.0  0.4   0:00.00 java -Xmx1g Threads    1132  16  7525 
 7529 kedar     20   0 3462036  34440   8728 S  0.0  0.4   0:00.00 java -Xmx1g Threads    1132  16  7525 
 7530 kedar     20   0 3462036  34440   8728 S  0.0  0.4   0:00.00 java -Xmx1g Threads    1132  16  7525 
 7531 kedar     20   0 3462036  34440   8728 S  0.0  0.4   0:00.49 java -Xmx1g Threads    1132  16  7525 
 7532 kedar     20   0 3462036  34440   8728 S  0.0  0.4   0:00.00 java -Xmx1g Threads    1132  16  7525 
 7533 kedar     20   0 3462036  34440   8728 S  0.0  0.4   0:00.00 java -Xmx1g Threads    1132  16  7525 
 7534 kedar     20   0 3462036  34440   8728 S  0.0  0.4   0:00.00 java -Xmx1g Threads    1132  16  7525 
 7535 kedar     20   0 3462036  34440   8728 S  0.0  0.4   0:00.22 java -Xmx1g Threads    1132  16  7525 
 7536 kedar     20   0 3462036  34440   8728 S  0.0  0.4   0:00.23 java -Xmx1g Threads    1132  16  7525 
 7537 kedar     20   0 3462036  34440   8728 S  0.0  0.4   0:00.21 java -Xmx1g Threads    1132  16  7525 
 7538 kedar     20   0 3462036  34440   8728 S  0.0  0.4   0:00.00 java -Xmx1g Threads    1132  16  7525 
 7539 kedar     20   0 3462036  34440   8728 S  0.0  0.4   0:08.70 java -Xmx1g Threads    1132  16  7525 
 7540 kedar     20   0 3462036  34440   8728 S  0.0  0.4   0:00.00 java -Xmx1g Threads    1132  16  7525

Scroll the above window All the Way to Right.

Look at the TGID column. This is where the glimpses of Linux’s threading library implementation are beginning to reveal. All the threads have the same ‘Thread Group ID’, although all of them have different pid’s. The Thread Group ID of all the threads is same as the process ID of the Java (JVM) process – this is the process ID that ps reported back before we started analyzing it with top.

So, you think that, maybe the Java/Native threads created in the JVM are fully specified by the (PID, TGID) pair? Yeah, that seems right. The JVM process (PID: 7525) also has the TGID 7525. The JVM then creates a native thread that runs the Java code in the so-called Main Thread. The main thread, you guess, would be the (7526, 7525) one, because at the moment, it’s keeping the CPU at a 100% (because of that busy-loop that reads the value of volatile variable: stopMain) and all the other are not doing any work.

Is that enough to conclude? Perhaps. But we programmers should sleuth and since (wo)man is a tool-using animal, we need tools. When I was befuddled by this, I somehow recalled another ‘excellent’ Linux tool: strace. This tool lets you trace the system-calls done by any process. So, I run strace next. For this, I close the Java program and rerun it, however, strace can attach to a running process and profile its system calls.

Run: strace -ff -o /tmp/trace.log java -Xmx1g Threads

From strace man page:

1
2
3
4
5
6
-f:  Trace child processes as they are created by currently traced processes as a  result  of  the  fork(2),
     vfork(2) and clone(2) system calls. Note that -p PID -f will attach all threads of process PID if it is
     multi-threaded, not only thread with thread_id = PID.
-ff: If the -o filename option is in effect, each processes trace is written to filename.pid  where  pid  is
     the  numeric process id of each process.  This is incompatible with -c, since no per-process counts are
     kept.

Here is the output:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
trace.log files:
 12 -rw-rw-r-- 1 kedar kedar   9998 Dec  3 15:54 trace.log.17449
164 -rw-rw-r-- 1 kedar kedar 165991 Dec  3 15:54 trace.log.17450
  4 -rw-rw-r-- 1 kedar kedar    967 Dec  3 15:54 trace.log.17451
  4 -rw-rw-r-- 1 kedar kedar    880 Dec  3 15:54 trace.log.17452
  4 -rw-rw-r-- 1 kedar kedar    845 Dec  3 15:54 trace.log.17453
  4 -rw-rw-r-- 1 kedar kedar    924 Dec  3 15:54 trace.log.17454
 40 -rw-rw-r-- 1 kedar kedar  35210 Dec  3 15:58 trace.log.17455
  4 -rw-rw-r-- 1 kedar kedar    948 Dec  3 15:54 trace.log.17456
  4 -rw-rw-r-- 1 kedar kedar   2666 Dec  3 15:54 trace.log.17457
  4 -rw-rw-r-- 1 kedar kedar   1046 Dec  3 15:54 trace.log.17458
 16 -rw-rw-r-- 1 kedar kedar  12411 Dec  3 15:58 trace.log.17459
 12 -rw-rw-r-- 1 kedar kedar  11529 Dec  3 15:58 trace.log.17460
 12 -rw-rw-r-- 1 kedar kedar  11038 Dec  3 15:58 trace.log.17461
  4 -rw-rw-r-- 1 kedar kedar    948 Dec  3 15:54 trace.log.17462
660 -rw-rw-r-- 1 kedar kedar 674837 Dec  3 15:58 trace.log.17463
  4 -rw-rw-r-- 1 kedar kedar   1005 Dec  3 15:54 trace.log.17464

ps:
kedar    17449 17444 98 15:54 pts/15   00:00:15 java -Xmx1g -verbose:gc Threads

Now, my favorite Linux system call is fork which (I thought) was the only way to create a new process in *nix. I thought maybe JVM is using pthreads library, who knows? I was prepared to grep for something like pthreads_create. So, I look at the trace.log.17449 and here’s what I find as the last line:

1
2
3
4
5
clone(child_stack=0x7fabfbd32fb0, 
flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|
CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, 
parent_tidptr=0x7fabfbd339d0, tls=0x7fabfbd33700, child_tidptr=0x7fabfbd339d0) 
= 17450

Oh I see! On Linux, on this particular kernel, the system call this JVM uses is clone! And this SO thread tells you about the ‘exceptional approach’ Linux takes about threads.

From man 2 clone:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
clone() creates a new process, in a manner similar to fork(2).
This page describes both the glibc clone() wrapper function and the 
underlying system call on which it is based.  The main text describes 
the wrapper function; the differences for the raw system call are 
described toward the end of this page.

Unlike fork(2), clone() allows the child process to share parts of 
its execution context with the calling process, such as the 
memory space, the table of file descriptors, and the table of i
signal handlers.  (Note that on this manual page, "calling process" 
normally corresponds to "parent process".  But see the description 
of CLONE_PARENT below.)

The main use of clone() is to implement threads: 
multiple threads of control in a  program  that  run  concurrently  in  a
shared memory space.

Now things begin to make (more) sense. If you carefully analyze the parameters passed by the JVM to the clone() system call you will know the choices it is making. The clone system call, just like the fork system call returns the pid of the created thread/process. The TGID of the created thread however stays the same, which is the pid of the ultimate process that starts all of it, the pid of the ‘java’ process that identifies the JVM.

This seems to nicely come together. I see the clone system calls in the relevant strace log files.

Java Thread Dumps

What about the Java Thread Dumps? IMO, thread dumps are one of the best diagnosis tools to find out what is going on with your Java Threads. And you can get thread dump using the kill -3 <pid> command that sends the SIGQUIT signal to the given pid. And now that we know that each thread has its own pid, you can send the SIGQUIT signal to any of the threads (pid: 17449 to 17463) to produce the dump or stack of all the threads! The pid’s of the threads are revealed in the thread dumps as well. If you did kill -3 18788 assuming 18788 is the pid of one of the threads started by the JVM, you will see something like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
"Thread-0" #9 prio=5 os_prio=0 tid=0x00007f1b800dc000 nid=0x4973 runnable [0x00007f1b68cb0000]
   java.lang.Thread.State: RUNNABLE
  at java.io.FileInputStream.readBytes(Native Method)
  - locked <0x00000000d69dbd48> (a java.io.InputStreamReader)
...

"Service Thread" #8 daemon prio=9 os_prio=0 tid=0x00007f1b800c9800 nid=0x4971 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C1 CompilerThread2" #7 daemon prio=9 os_prio=0 tid=0x00007f1b800b1000 nid=0x4970 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread1" #6 daemon prio=9 os_prio=0 tid=0x00007f1b800af800 nid=0x496f waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" #5 daemon prio=9 os_prio=0 tid=0x00007f1b800ac800 nid=0x496e waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" #4 daemon prio=9 os_prio=0 tid=0x00007f1b800aa800 nid=0x496d waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"VM Thread" os_prio=0 tid=0x00007f1b80072800 nid=0x496a runnable 
....

Note that you can send the SIGQUIT signal to any of the pid’s that are identified by those nid values in the thread dump: It’s all in the family :-) of the thread group the thread belongs to. For some reason, the JVM guys chose to format those nid values in hex (e.g. hex 496a = 18794 in decimal).

Lessons

  1. We should know top (and other diagnosis tools) really well. Memorizing the keystrokes and understanding their meaning really helps.
  2. Be curious.

Coursera-ml-course-and-beyond

Writing after a long hiatus for life has been busy.

But today, I want to tell you about a wonderful experience I had for last 3-4 months. I finally completed the Coursera Machine Learning Course successfully.

This course taught by Professor Andrew Ng from Stanford University is an absolute gem. I would recommend it to any engineer with enough interest in math and a curious mind. The things that stand out for this course are:

  1. Excellent background work done by Coursera. The professor himself, the teaching assistants, the course technique are totally flawless. They have done tremendous amount of work on it and their passion is quite visible. The video player is an innovation too. (The iPhone app could be better, especially in terms of pause and magnification).
  2. The professor loves to teach and the way he has developed his style particularly suits the online courses. He is lucid in his explanation and also quite encouraging. He perhaps understands the pioneering work Coursera/edX/Khan Academy are doing and acknowledges that the students (7.5 million and counting) who have taken this course are busy elsewhere. He perhaps thinks intuitively that education is a configurable and cooperative process. Configurable means personalized and any education endeavor on the web (a successful one) has to be a cooperative task.
  3. The assignments are moderately hard. They are perhaps not as hard as CS-229 offered by Stanford, but they keep you quite busy and interested. The choice of tools is excellent for a starting course like this.
  4. The ecosystem around the course is excellent.

There are several experiments happening around learning and education and as always, we live in interesting times! It is yet to be seen what the Internet has to offer in terms of education, but the outlook is very positive and I believe the online courses are going to be more and more effective.

The biggest interest on my part is really the personalization and freedom in learning. We are beginning to see that the ways to spread knowledge and impart education can truly be transformed via the Web. You no longer need to feel miserable for you couldn’t take a particular course during your normal college education. There is time before any kind of revolution happens but things are evolving at a decent pace. I am very optimistic about how our and next generation deals with this experiment. I hope the right things emerge out of this – you are free to learn anything you want!

By the way, (shameless plug) I did get my certificate and I am very happy about it for this opens up possibilities in applying sensible machine learning techniques to many problems. I am extremely grateful to my family in general and my wife and JyMob Cofounder in particular for the support and encouragement they endowed upon me!

What do you think?

On Designing Questionnaires That Are Conclusive

One of the strongest motivations for us at JyMob is the quality of its questionnaires. It’s our real IP. We strongly believe that to improve hiring inefficiencies, companies need to employ strategies that simply:

  1. Screen-in the relevant candidates, and
  2. Screen-out the irrelevant (not bad) candidates

This is, of course, easier said than done. It can only happen if the question-setters are committed to the worthwhileness of the questions. They should be conclusive. If a correct or incorrect answer to a question is inconclusive of whether the candidate understands and has solved it correctly, then, in our opinion, the question has failed.

Unfortunately, such questions form a vast majority of questions asked in online candidate assessment products. At JyMob, therefore, we are quite sensitive to forming questions that are worthwhile. There’s no point in further complicating an already complex process such as ‘hiring’. From an ‘objective assessment’ standpoint, there’s nothing wrong in having a question that is multiple-choice or multiple-select, but as it turns out, creating a challenging, fun and clear multiple-choice, multiple-select question is often time-consuming, if not difficult. Our experts, we believe know how to do it and we are constantly look for experts who share this thought process.

For example, take a look at this question that appears in JyMob’s problem database: a good multiple-choice question.

Here are the criteria for a good multiple-choice question satisfied by this one:

  • It makes it (almost) imperative for the candidate to attempt this question seriously. Of course, they can apply brute force and choose one of the 5 options provided. But then again, JyMob makes it clear that the hiring team for this job may ask you to provide actual code (demonstrating your thought process) that resulted in your selection of a given option.
  • It’s suited for an ‘online’ assessment in that it makes candidates do something before resorting to online havens like Google, Stack Overflow, Wikipedia etc.
  • It’s not based on some rather obscure feature of computing. If you are working on data interchange, you need to know how the data is interpreted on various computers!
  • It’s conclusive enough. By providing an “indirection” in terms of an attachment, the designer of this question clearly made the candidate work.

I will write later about our philosophy of various ‘testing models’ we are playing with in order to provide an environment where candidates might be assessed in the most appropriate manner. But for now, we can see the power of simple multiple-choice questions designed with the end in mind – being able to conclusively assess a candidate.

Candidate’s comfort level is another thing we, as question setters have in our minds. We don’t want to annoy candidates that you would love to interview. Thus, we want to make the question as clear as possible. We believe we have been extremely diligent in our endeavor.

If you are, or know anyone who loves to design such questions, please let us know (or let them know that we, at JyMob, are interested in using their skills by compensating them well for their creativity).

The Power of JyMob Badge for HR People

In this slightly longer post, I try to explain the idea behind the JyMob Badge. We believe that this post is especially useful for those who work in the HR function of companies and want to innovate it for effective hiring. So, if you are an HR decision maker or a recruiter at a company, this post is for you. Even if you are not an “HR person”, but care deeply about effective hiring, this post will bring about the essence of JyMob’s approach.

What JyMob Badge Is

When you post a job at JyMob, you get to create a link for that job. Simply speaking, this is the JyMob Badge for that job. In technical terms, it is a simple HTML link which when placed on an HTML page looks like this:

When someone clicks on the badge, s/he is seamlessly led into the JyMob pre-hire assessment. That’s it. So, you just have to do the following:

  1. Post a Job in seconds.
  2. Get the JyMob Badge for it and place it on your job ad. It helps if all ads for your job pass through one page to keep your life simple. This way, all the applicants see the badge.

Easy, isn’t it?

Benefits of the Badge

This easy way of getting the badge brings you some enormous benefits. First of all, just ask yourself – Didn’t these words: “Show of Your Skills” challenge me? – I am sure they did. And that’s increasing the effectiveness of your job ad right there. It’s enticing enough so that active job seekers are challenged to prove that they are eligible for the job. If they realize that proving their relevance via an online test may bring them a chance to interview with you, they will click on the badge.

By designing the test with ease (because of JyMob’s techniques) and with great care at the same time (remember, it’s not “just the test for the sake of it”), you are guaranteed to attract the right talent – the talent that might be right for the job and, more importantly, the talent that is willing to take the crucial first step of helping you determine the relevance in a structured, scientific way. And frankly, this talent is being found by simply placing your ad on ad channels (which you anyway do) and placing the JyMob badge on those ads. Choosing the best among the active (and willing) candidates this way complements your (expensive) efforts of sourcing the passive and semi-active candidates. Statistics show that just because they are seeking for a job doesn’t mean that the active job seekers are not suitable.

Candidates get pissed off when the tests are poorly designed and the tools to take them have an awful user experience. We avoid both these pitfalls at JyMob. You are simply increasing your chances of getting sensible candidates without having to burden a typical recruiter with assessing the technical competence which is a task that should be left to subject matter experts. Agreed, these subject matter experts are not cost-effective, but that’s exactly what we have set out to do at JyMob – bring you a set of subject matter experts at affordable prices, so that your energy is expended in later stages.

What it Means to Your HR Efforts

In our experience in building JyMob as a company, we have come to understand that JyMob Badges are an interesting innovation that HR personnel can very easily accomplish. Hiring is hard and hiring the right talent is harder, no one knows this better than you folks, the loyal HR people! Yet, we find that there is a widely held perception that HR function is hardest to innovate. Perhaps that notion is true, but by being able to do small experiments, you can increase your effectiveness – in this case for instance, you can use JyMob to improve the hiring efforts in no time and without any significant overhead. You don’t have to change your practices – seamlessly involve your engineering/sales/support teams in your efforts to improve the throughput of good, relevant candidates which your hiring teams like to spend time interviewing. I believe this is one of the findings of a recent research done by Bersin Associates – HR people must improve their use of tools. Otherwise, I fear that effectiveness of recruiting done by HR people will continue to deteriorate and that significant function (recruiting) will be influenced more and more by people in other functions of your company.

It’s time for you, the HR people, to step up and do some experiments like JyMob. JyMob’s integration is seamless and lightweight. You don’t have to get caught in yearly contracts and get locked up in corporate-wide deployments before your are convinced of its effectiveness. Ironically, it’s this lack of chance to innovate that makes it hard for HR people to prove effective.

I have no doubt that JyMob will help in both short and long run – effective pre-hire assessment is critical. We just make it easy for you to do it online – remember, software is eating the world – you just have to choose the tools that are effective and are a pleasure to work with.

If I Were to Try to …

While doing anything serious, or anything at all, Lincoln’s advice is spot on. He once (in extremely trying circumstances) said:

If I were to try to read, much less answer, all the attacks made on me, this shop might as well be closed for any other business. I do the very best I know how – the very best I can; and I mean to keep doing so until the end. If the end brings me out all right, what’s said against me won’t amount to anything. If the end brings me out wrong, ten angels swearing I was right would make no difference.

These are really powerful words. They resonate with you every time you are at the helm and you are disappointed. Perhaps not so surprisingly, like other walks of life, these words apply to entrepreneurship. While doing the things you do, there are challenging moments and keeping such a “being with the end in the mind” approach helps.

Ranking Candidates by Relevance Score

We are excited to announce that we are experimenting with a robust new feature for JyMob. This is related to providing a quantifiable assessment of a candidate’s relevance to a given job.

Yes, our machine learning algorithms now do a great deal in trying to find out the match between job descriptions and candidate resumes. Of course, we know that a lot gets lost in translation when it comes to job descriptions and resumes, but a robust candidate assessment service should have a way to quickly sort candidates based on their overall relevance to the given job. This is especially useful in high volume cases.

We plan to tweak the matching algorithm in coming weeks and months. If you want to participate in helping us perform better, we request you to take a look and provide some feedback. Seasoned recruiters, hiring managers and engineers are extremely good at deciding the match amidst imprecise data (like job descriptions and resumes), and we want to improve our algorithms to become reliable in deciding a match.

The feature integrates nicely with the list of candidates and is available when the candidates show off their skills on our tests. You should really try it out yourself, but here is what it looks like:

The candidates are nicely sorted on the score and when you click on a particular score, you get the details that went into determining the score:

Nice, isn’t it?

But wait, that’s not all. We encourage you to form the hiring team for your jobs and we also calculate the hiring team sentiment based on how candidate has performed on the display of their skills! Here’s what it looks like:

We hope this helps you in making an informed decision about whether to invite a candidate or to archive his/her job application (which is also a feature I will talk about soon).

If you haven’t already, give JyMob a try. It’s well worth your time :-).

Some Ruby Fun

Building a startup is hard like so many well-written articles and personal experiences will tell you. But you don’t have to give up what you love (in addition to doing your startup) while doing it. Agreed, the best way to do many things is to do them one at a time: you can’t be building a startup, learning to cook Italian recipes, learning to improve your chess Elo rating from 1800s to 2100s, get really good at math or a new language all at once. Prioritize.

But doing little things you always wanted to do is welcome. It stimulates your way back into your startup struggle and reinvigorates you.

I like explaining things I think I have understood well to others. In that process, I learn a lot myself. Recently, I started dabbling with the Ruby programming language. I wrote some introductory text about understanding Ruby’s classes and objects. It has received a reasonably good feedback on the Ruby-talk mailing list.

Hope you enjoy some Ruby fun!

The JyMob Problem Database

It is imperative that you test the job candidates to your satisfaction. If they are technically competent, you should make sure that there is a cultural fit. In general, humans are great at deciding whether another person is nice to work with. Simply put, if you are reasonably sure about technical prowess/competence of a person you are trying to hire, all there’s in the personal interviews is to ensure that s/he is not a jerk. Recently, Lisa Quast wrote about this and put a special emphasis of testing active job seekers.

Now, to be sure of one’s technical competence is hard. Doing it using the computers is even more so. In these days of fast search engines, testing for the sake of it is utterly meaningless. Hiring is a complex process. Making it even more complex by having a “technical test” appears unnecessary. And it is, if the test is testing nothing specific and if the problems are “googleable”. Granted, we all use Google, Quora, Stack Overflow, Wikipedia, the Internet and our tools to solve the problems we face in professional career. Amidst such a resourceful environment however, it is possible for humans to pose/create problems that are interesting and really test if someone knows the stuff s/he claims. Asking such questions/problems also gives the candidates a glimpse of what you/your company/a particular group is working on.

And then, it is not about just the set of problems or questions in a database. It’s about the testing models. A question can be a carefully crafted multiple choice problem, a puzzle, fixing a (purposely) botched computer in the cloud, a piece of source code in an open source project, a picture in the form of attachments and so on! This is what the JyMob Problem Database is all about. As you know, the core value proposition of JyMob is in creating a test by choosing relevant problems from this growing problem database. Here is what is different about our problems/questions:

  1. It is created and curated by humans. We are constantly looking for experts who love creating innovative problems for us :-). Nothing is as effective as asking experts to contribute real-life problems to our database.

  2. As employers, you have access to all the public problems in the database.

  3. You can create your own problems and not share them with others. What this means is that you can define all your problems and use them for your own jobs.

  4. The problems are of various kinds. We intend to provide you with an infrastructure that is useful for you to test a particular skill. In some cases, asking moderate-to-hard programming problems with varying algorithmic complexity is required. In other cases, you want to find out how good the sense of a user’s sense of “user experience” is. In some other cases, you want to find out whether the candidate applying for a QA position knows about how testable some code is or how to write testable code.

  5. The problems are easily searchable by keywords. You can choose your own problems and generate a test on your own. Every candidate applying for your job(s) can get the exact same test comprising of the problems you choose. If you want to make the tests less predictable (gameable), you should rely on our matching algorithm, however.

The entire treasure trove of problems is available to you when you post your job(s). Once the candidates submit the solutions, you can create discussion threads where you can ask for clarifications and improvements. This keeps an entire record of discussion you have had with the candidate. Going ahead, there will be a possibility of scheduling interactive chat session with one button click.

Such an infrastructure is both job-specific and job-agnostic. This means you can use it to test a sales engineer, a tech support person, a software engineer, a devops candidate etc. In other words, it is not just for coders.

The JyMob Problem Database is thus versatile, growing, dynamic and worthwhile. It helps you find the best people with confidence. It is JyMob’s core IP.

And yes, if you are an expert or know of any expert(s) who loves to compose and pose the problems ;), please let us know.

The JyMob Features

Now that the first JyMob Demo Day is over, I decided to make sure that I resume some targeted blogging so that we can make our customers aware of some of the key things that we have.

Of course, it is not about the feature list and the ugly competition that emerges as a result. Most of the features of a product should work in concert, like a well-made movie. Any time there is a mismatch, the feature should not make its way into the product. Maybe those other features should define another product or another service integrated into your flagship product.

To that effect, this blog post is going to be constantly upgraded as it is a list (or matrix, like Evan Miller suggests) of features JyMob has. Every time I write about a feature, I will link to it from here.

Before I begin, I just want to reiterate that JyMob is a service to transform the human resources function step-by-step. We want to beautify the career pages the companies have and help them attract the talent they need. At the same time, it is an avenue for job seekers to demonstrate their effectiveness. Beautifying career pages of companies has deep connotations and we believe that we will slowly unravel the magical products/services to do that …

So, here we go:

  1. The JyMob Problem Database.
  2. Deciding Relevance of candidates to jobs.
  3. The Power of the JyMob Badge.
  4. The JyMob Machine Learning Algorithm.
  5. Automated Assessment and the Hiring Committee.
  6. Versatile Test Environment.

The First Demo Day

In the (early) life of a startup, a specific day has a special meaning and importance. This day, fondly referred to as the “Demo Day” is the time when the startup founders show off their product to potential investors and others. Of course, the venerable Y Combinator has made this term very popular.

We at JyMob have been very excited about this day. About 6 months ago, we were admitted into the first batch of Alchemist Accelerator and our first ever demo day is set for tomorrow, Thursday, January 17, 2013.

Firstly, like the name says, it’s time to “demo” your product in about eight minutes. Yes, that’s right – you get 08 minutes to explain your vision, your mission and your message. Fortunately, for us, our product is our message. It is in a pretty good shape and that gives Deepa, a lot of confidence. Good luck, Deepa.

Of course, the ultimate power lies with you guys, the users of our service. But personally, I take demo day as a validation avenue. If I have really understood the product, its positioning and relevance, I should be able to generate enough interest in potential investors. Independent of whether my startup is accepting or should accept external money, the positive frame of mind should be whether I can explain my product to someone (who I don’t know) in a short amount of time. After all, someone is coming there and investing his or her precious time in listening to me. I have got to do my best in delivering the message. Period.

As for the product, yeah, like I said, we had been busy doing some really cool things with respect to making JyMob’s distribution easy. New features, new look of the website is refreshing for all of us here. If you haven’t tried it already, give JyMob a try, now!