I ran some benchmark tests during the night for the UST registration side. I have a private branch with the shrinkable hashtable of URCU (not yet upstream) which are used for almost every UST data structure.

So, cpu cycles are taken from the start of a UST registration (when the register packet is received from the apps) to the end which is when the the apps is added to the hashtable (after the register done is sent).

The "unregistration" is from the close socket is detected until the removal from the hashtable.

1000 run was made with one application registering saying "Hello World" and unregistering.
CPU: i7 920

=== UST register data ===
Average time: 0.00007104930232558147 sec
Standard variation: 0.00027245202875200215 sec
Best run: 0.00004025056264066017 sec
Worst run: 0.00781043510877719397 sec
=== UST unregister data ===
Average time: 0.00001046121155288823 sec
Standard variation: 0.00001924941982788751 sec
Best run: 0.00000556114028507127 sec
Worst run: 0.00061544636159039767 sec

I've also run it with the old scheme (linked list NOT RCU protected):

=== UST register data ===
Average time: 0.00010490242692252016 sec
Standard variation: 0.00000532420457958496 sec
Best run: 0.00006539197299324832 sec
Worst run: 0.00011769692423105777 sec
=== UST unregister data ===
Average time: 0.00001823643016017162 sec
Standard variation: 0.00000102737202505177 sec
Best run: 0.00001565866466616654 sec
Worst run: 0.00002700675168792198 sec

Well, we can see improvement on the registration time since no mutex needs to be acquired. On the unregister time, there is a slight improvement explain also by the mutex lock removed.

Note that there is only *one* apps here so insertion/lookup/deletion in the list or hashtable is the same O(1).

I'm currently running benchmark with 1000 apps registering and unregistering at the same time. We'll see their the power of RCU hashtables :)