A closer look at CMWQ

alloc_workqueue()   is used to allocate a wq.

Takes in 3 parameters:

  • @name is the name of the wq.
  • @flags  control how work items are assigned execution resources, scheduled and executed.
  • @max_active determines the maximum number of execution contexts per
    CPU which can be assigned to the work items of a wq.

Will post detailed  examples about the usage of workqueues through my patches in the subsequent posts…

 

Example Scenario.

 

A work  is described by five parameters – burn_usecs, mean_sleep_msecs,
mean_resched_msecs and factor.

It randomly splits burn_usecs into two, burns the first part, sleeps for 0 – 2 * mean_sleep_msecs, burns what’s left of burn_usecs and then reschedules itself in 0 – 2 *mean_resched_msecs.  factor is used to tune the number of cycles to
match execution duration.

Stats from: http://lwn.net/Articles/393172/

It issues three types of works – short, medium and long, each with two
burn durations L and S.

burn/L(us) burn/S(us) meanSleep

(ms)

meanResched

(ms)

cycles
short 50 1 1 10 454
medium 50 2 10 50 125
long 50 4 100 250 42

And then these works are put into the following workloads.  The lower
numbered workloads have more short/medium works.

 

WL0 WL1 WL2 WL3 WL4 WL5
* 12 wqs with 4 short works
*  2 wqs with 2 short  and 2 medium works
*  4 wqs with 2 medium and 1 long works
*  8 wqs with 1 long work
*  8 wqs with 4 short works
*  2 wqs with 2 short  and 2 medium works
*  4 wqs with 2 medium and 1 long works
*  8 wqs with 1 long work
*  4 wqs with 4 short works
*  2 wqs with 2 short  and 2 medium works
*  4 wqs with 2 medium and 1 long works
*  8 wqs with 1 long work
*  2 wqs with 4 short works
*  2 wqs with 2 short  and 2 medium works
*  4 wqs with 2 medium and 1 long works
*  8 wqs with 1 long work
*  2 wqs with 4 short works
*  2 wqs with 2 medium works
*  4 wqs with 2 medium and 1 long works
*  8 wqs with 1 long work
*  2 wqs with 2 medium works
*  4 wqs with 2 medium and 1 long works
*  8 wqs with 1 long work

The above wq loads are run in parallel , converting 76M  mjpeg file into mpeg4 which takes 25.59 seconds with standard deviation of 0.19 without wq loading.  Each test case was run 11 times and the first run was discarded. There is no significant difference between the two.

 

vanilla/L cmwq/L        vanilla/S    cmwq/S
wl0 26.18 d0.24 26.27 d0.29
wl1 26.50 d0.45 26.52 d0.23
wl2 26.62 d0.35 26.53 d0.23 26.14 d0.22 26.12 d0.32
wl3 26.30 d0.25 26.29 d0.26 25.94 d0.25 26.17 d0.30
wl4 26.26 d0.23 25.93 d0.24 25.90 d0.23 25.91 d0.29
wl5 25.81 d0.33 25.88 d0.25 25.63 d0.27 25.59 d0.26

Clearly,

CMWQ extends workqueue such that it can serve as robust async
mechanism which can be used (mostly) universally without introducing any noticeable performance degradation.
Screenshot from 2016-08-28 02:12:32

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s