OpenSuSE Man Pages

home | help
x SuSE Linux 13.1-RELEASE x
x SuSE Linux 13.1-RELEASEx
SCHED(7)                   Linux Programmer's Manual                  SCHED(7)

NAME
       sched - overview of scheduling APIs

DESCRIPTION
   API summary
       The Linux scheduling APIs are as follows:

       sched_setscheduler(2)
              Set the scheduling policy and parameters of a specified thread.

       sched_getscheduler(2)
              Return the scheduling policy of a specified thread.

       sched_setparam(2)
              Set the scheduling parameters of a specified thread.

       sched_getparam(2)
              Fetch the scheduling parameters of a specified thread.

       sched_get_priority_max(2)
              Return  the minimum priority available in a specified scheduling
              policy.

       sched_get_priority_min(2)
              Return the maximum priority available in a specified  scheduling
              policy.

       sched_rr_get_interval(2)
              Fetch  the quantum used for threads that are scheduled under the
              "round-robin" scheduling policy.

       sched_yield(2)
              Cause the caller to relinquish  the  CPU,  so  that  some  other
              thread be executed.

       sched_setaffinity(2)
              (Linux-specific) Set the CPU affinity of a specified thread.

       sched_getaffinity(2)
              (Linux-specific) Get the CPU affinity of a specified thread.

       sched_setattr(2)
              Set  the scheduling policy and parameters of a specified thread.
              This (Linux-specific) system call provides  a  superset  of  the
              functionality of sched_setscheduler(2) and sched_setparam(2).

       sched_getattr(2)
              Fetch  the  scheduling  policy  and  parameters  of  a specified
              thread.  This (Linux-specific) system call provides  a  superset
              of  the  functionality  of  sched_getscheduler(2) and sched_get-
              param(2).

   Scheduling policies
       The scheduler is the  kernel  component  that  decides  which  runnable
       thread will be executed by the CPU next.  Each thread has an associated
       scheduling policy and a  static  scheduling  priority,  sched_priority.
       The  scheduler makes its decisions based on knowledge of the scheduling
       policy and static priority of all threads on the system.

       For threads scheduled under  one  of  the  normal  scheduling  policies
       (SCHED_OTHER,  SCHED_IDLE,  SCHED_BATCH), sched_priority is not used in
       scheduling decisions (it must be specified as 0).

       Processes scheduled under one of the  real-time  policies  (SCHED_FIFO,
       SCHED_RR)  have  a  sched_priority  value  in  the  range 1 (low) to 99
       (high).  (As the numbers imply, real-time threads  always  have  higher
       priority  than  normal  threads.)   Note well: POSIX.1-2001 requires an
       implementation to support only a minimum 32  distinct  priority  levels
       for  the real-time policies, and some systems supply just this minimum.
       Portable   programs   should    use    sched_get_priority_min(2)    and
       sched_get_priority_max(2) to find the range of priorities supported for
       a particular policy.

       Conceptually, the scheduler maintains a list of  runnable  threads  for
       each possible sched_priority value.  In order to determine which thread
       runs next, the scheduler looks for the nonempty list with  the  highest
       static priority and selects the thread at the head of this list.

       A  thread's scheduling policy determines where it will be inserted into
       the list of threads with equal static priority and  how  it  will  move
       inside this list.

       All scheduling is preemptive: if a thread with a higher static priority
       becomes ready to run, the currently running thread  will  be  preempted
       and  returned  to  the  wait  list  for its static priority level.  The
       scheduling policy determines the  ordering  only  within  the  list  of
       runnable threads with equal static priority.

   SCHED_FIFO: First in-first out scheduling
       SCHED_FIFO can be used only with static priorities higher than 0, which
       means that when a SCHED_FIFO threads becomes runnable, it  will  always
       immediately  preempt any currently running SCHED_OTHER, SCHED_BATCH, or
       SCHED_IDLE thread.  SCHED_FIFO is a simple scheduling algorithm without
       time  slicing.   For threads scheduled under the SCHED_FIFO policy, the
       following rules apply:

       *  A SCHED_FIFO thread that has been preempted  by  another  thread  of
          higher  priority  will stay at the head of the list for its priority
          and will resume execution as soon as all threads of higher  priority
          are blocked again.

       *  When  a  SCHED_FIFO  thread becomes runnable, it will be inserted at
          the end of the list for its priority.

       *  A   call    to    sched_setscheduler(2),    sched_setparam(2),    or
          sched_setattr(2)  will put the SCHED_FIFO (or SCHED_RR) thread iden-
          tified by pid at the start of the list if it  was  runnable.   As  a
          consequence,  it  may preempt the currently running thread if it has
          the same priority.  (POSIX.1-2001 specifies that the  thread  should
          go to the end of the list.)

       *  A thread calling sched_yield(2) will be put at the end of the list.

       No  other events will move a thread scheduled under the SCHED_FIFO pol-
       icy in the wait list of runnable threads with equal static priority.

       A SCHED_FIFO thread runs until either it is blocked by an I/O  request,
       it   is   preempted   by   a   higher  priority  thread,  or  it  calls
       sched_yield(2).

   SCHED_RR: Round-robin scheduling
       SCHED_RR is a simple enhancement of SCHED_FIFO.   Everything  described
       above  for SCHED_FIFO also applies to SCHED_RR, except that each thread
       is allowed to run only for a  maximum  time  quantum.   If  a  SCHED_RR
       thread  has  been running for a time period equal to or longer than the
       time quantum, it will be put at the end of the list for  its  priority.
       A  SCHED_RR  thread that has been preempted by a higher priority thread
       and subsequently resumes execution as a running  thread  will  complete
       the  unexpired  portion of its round-robin time quantum.  The length of
       the time quantum can be retrieved using sched_rr_get_interval(2).

   SCHED_DEADLINE: Sporadic task model deadline scheduling
       Since  version  3.14,  Linux  provides  a  deadline  scheduling  policy
       (SCHED_DEADLINE).   This  policy  is  currently  implemented using GEDF
       (Global Earliest Deadline First)  in  conjunction  with  CBS  (Constant
       Bandwidth  Server).   To  set  and  fetch  this  policy  and associated
       attributes,  one  must  use  the  Linux-specific  sched_setattr(2)  and
       sched_getattr(2) system calls.

       A  sporadic  task is one that has a sequence of jobs, where each job is
       activated at most once per period.  Each job also has a relative  dead-
       line,  before which it should finish execution, and a computation time,
       which is the CPU time necessary for executing the job.  The moment when
       a  task  wakes  up  because  a new job has to be executed is called the
       arrival time (also referred to as the request time  or  release  time).
       The  start  time is the time at which a task starts its execution.  The
       absolute deadline is thus obtained by adding the relative  deadline  to
       the arrival time.

       The following diagram clarifies these terms:

           arrival/wakeup                    absolute deadline
                |    start time                    |
                |        |                         |
                v        v                         v
           -----x--------xooooooooooooooooo--------x--------x---
                         |<- comp. time ->|
                |<------- relative deadline ------>|
                |<-------------- period ------------------->|

       When   setting   a   SCHED_DEADLINE   policy   for   a   thread   using
       sched_setattr(2), one can specify three parameters: Runtime,  Deadline,
       and  Period.   These  parameters  do  not necessarily correspond to the
       aforementioned terms: usual practice is to  set  Runtime  to  something
       bigger  than the average computation time (or worst-case execution time
       for hard real-time tasks),  Deadline  to  the  relative  deadline,  and
       Period to the period of the task.  Thus, for SCHED_DEADLINE scheduling,
       we have:

           arrival/wakeup                    absolute deadline
                |    start time                    |
                |        |                         |
                v        v                         v
           -----x--------xooooooooooooooooo--------x--------x---
                         |<-- Runtime ------->|
                |<----------- Deadline ----------->|
                |<-------------- Period ------------------->|

       The three deadline-scheduling parameters correspond to  the  sched_run-
       time,  sched_deadline, and sched_period fields of the sched_attr struc-
       ture; see sched_setattr(2).  These fields express value in nanoseconds.
       If  sched_period  is  specified  as  0,  then  it  is  made the same as
       sched_deadline.

       The kernel requires that:

           sched_runtime <= sched_deadline <= sched_period

       In addition, under the current implementation,  all  of  the  parameter
       values must be at least 1024 (i.e., just over one microsecond, which is
       the resolution of the implementation), and less than 2^63.  If  any  of
       these checks fails, sched_setattr(2) fails with the error EINVAL.

       The  CBS  guarantees  non-interference  between  tasks,  by  throttling
       threads that attempt to over-run their specified Runtime.

       To ensure deadline scheduling guarantees, the kernel must prevent situ-
       ations where the set of SCHED_DEADLINE threads is not feasible (schedu-
       lable) within the given  constraints.   The  kernel  thus  performs  an
       admittance  test  when  setting  or  changing SCHED_DEADLINE policy and
       attributes.  This admission test calculates whether the change is  fea-
       sible; if it is not sched_setattr(2) fails with the error EBUSY.

       For  example,  it  is required (but not necessarily sufficient) for the
       total utilization to be less than or equal to the total number of  CPUs
       available,  where,  since each thread can maximally run for Runtime per
       Period, that thread's utilization is its Runtime divided by its Period.

       In order to fulfil the guarantees that are made when a thread is admit-
       ted  to the SCHED_DEADLINE policy, SCHED_DEADLINE threads are the high-
       est  priority  (user  controllable)  threads  in  the  system;  if  any
       SCHED_DEADLINE thread is runnable, it will preempt any thread scheduled
       under one of the other policies.

       A call to fork(2) by a thread scheduled under the SCHED_DEADLINE policy
       will  fail  with  the error EAGAIN, unless the thread has its reset-on-
       fork flag set (see below).

       A SCHED_DEADLINE thread that calls sched_yield(2) will yield  the  cur-
       rent job and wait for a new period to begin.

   SCHED_OTHER: Default Linux time-sharing scheduling
       SCHED_OTHER  can be used at only static priority 0.  SCHED_OTHER is the
       standard Linux time-sharing scheduler that is intended for all  threads
       that  do  not  require the special real-time mechanisms.  The thread to
       run is chosen from the static priority 0 list based on a dynamic prior-
       ity  that is determined only inside this list.  The dynamic priority is
       based  on  the  nice  value  (set  by   nice(2),   setpriority(2),   or
       sched_setattr(2))  and  increased  for  each time quantum the thread is
       ready to run, but denied to run by the scheduler.   This  ensures  fair
       progress among all SCHED_OTHER threads.

   SCHED_BATCH: Scheduling batch processes
       (Since  Linux 2.6.16.)  SCHED_BATCH can be used only at static priority
       0.  This policy is similar to SCHED_OTHER  in  that  it  schedules  the
       thread  according  to  its  dynamic priority (based on the nice value).
       The difference is that this policy will cause the scheduler  to  always
       assume  that  the thread is CPU-intensive.  Consequently, the scheduler
       will apply a small scheduling penalty with respect to wakeup behaviour,
       so that this thread is mildly disfavored in scheduling decisions.

       This policy is useful for workloads that are noninteractive, but do not
       want to lower their nice value, and for workloads that want a determin-
       istic scheduling policy without interactivity causing extra preemptions
       (between the workload's tasks).

   SCHED_IDLE: Scheduling very low priority jobs
       (Since Linux 2.6.23.)  SCHED_IDLE can be used only at  static  priority
       0; the process nice value has no influence for this policy.

       This  policy  is  intended  for  running jobs at extremely low priority
       (lower even than a +19 nice value with the SCHED_OTHER  or  SCHED_BATCH
       policies).

   Resetting scheduling policy for child processes
       Each  thread  has  a  reset-on-fork scheduling flag.  When this flag is
       set, children created by fork(2) do not inherit  privileged  scheduling
       policies.  The reset-on-fork flag can be set by either:

       *  ORing  the  SCHED_RESET_ON_FORK  flag  into the policy argument when
          calling sched_setscheduler(2) (since Linux 2.6.32); or

       *  specifying the  SCHED_FLAG_RESET_ON_FORK  flag  in  attr.sched_flags
          when calling sched_setattr(2).

       Note  that the constants used with these two APIs have different names.
       The state of the reset-on-fork flag can analogously be retrieved  using
       sched_getscheduler(2) and sched_getattr(2).

       The  reset-on-fork feature is intended for media-playback applications,
       and can be used  to  prevent  applications  evading  the  RLIMIT_RTTIME
       resource limit (see getrlimit(2)) by creating multiple child processes.

       More  precisely,  if the reset-on-fork flag is set, the following rules
       apply for subsequently created children:

       *  If the calling thread has  a  scheduling  policy  of  SCHED_FIFO  or
          SCHED_RR, the policy is reset to SCHED_OTHER in child processes.

       *  If  the calling process has a negative nice value, the nice value is
          reset to zero in child processes.

       After the reset-on-fork flag has been enabled, it can be reset only  if
       the  thread  has the CAP_SYS_NICE capability.  This flag is disabled in
       child processes created by fork(2).

   Privileges and resource limits
       In Linux kernels before 2.6.12, only privileged (CAP_SYS_NICE)  threads
       can  set  a  nonzero  static priority (i.e., set a real-time scheduling
       policy).  The only change that an unprivileged thread can  make  is  to
       set  the SCHED_OTHER policy, and this can be done only if the effective
       user ID of the caller matches the real or effective user ID of the tar-
       get  thread  (i.e.,  the thread specified by pid) whose policy is being
       changed.

       A thread must be privileged (CAP_SYS_NICE) in order to set or modify  a
       SCHED_DEADLINE policy.

       Since  Linux 2.6.12, the RLIMIT_RTPRIO resource limit defines a ceiling
       on an unprivileged  thread's  static  priority  for  the  SCHED_RR  and
       SCHED_FIFO policies.  The rules for changing scheduling policy and pri-
       ority are as follows:

       *  If an unprivileged thread has a nonzero  RLIMIT_RTPRIO  soft  limit,
          then  it  can  change its scheduling policy and priority, subject to
          the restriction that the priority cannot be set to  a  value  higher
          than  the maximum of its current priority and its RLIMIT_RTPRIO soft
          limit.

       *  If the RLIMIT_RTPRIO soft  limit  is  0,  then  the  only  permitted
          changes  are  to lower the priority, or to switch to a non-real-time
          policy.

       *  Subject to the same rules, another unprivileged thread can also make
          these changes, as long as the effective user ID of the thread making
          the change matches the real or  effective  user  ID  of  the  target
          thread.

       *  Special  rules  apply  for  the SCHED_IDLE policy.  In Linux kernels
          before 2.6.39, an unprivileged thread operating  under  this  policy
          cannot   change   its   policy,  regardless  of  the  value  of  its
          RLIMIT_RTPRIO resource limit.  In Linux  kernels  since  2.6.39,  an
          unprivileged  thread  can  switch  to  either the SCHED_BATCH or the
          SCHED_NORMAL policy so long as its nice value falls within the range
          permitted by its RLIMIT_NICE resource limit (see getrlimit(2)).

       Privileged  (CAP_SYS_NICE)  threads  ignore the RLIMIT_RTPRIO limit; as
       with older kernels, they can make arbitrary changes to scheduling  pol-
       icy   and  priority.   See  getrlimit(2)  for  further  information  on
       RLIMIT_RTPRIO.

   Limiting the CPU usage of real-time and deadline processes
       A nonblocking infinite loop in a thread scheduled under the SCHED_FIFO,
       SCHED_RR,  or  SCHED_DEADLINE  policy will block all threads with lower
       priority forever.  Prior to Linux 2.6.25, the only way of preventing  a
       runaway  real-time  process from freezing the system was to run (at the
       console) a shell scheduled under a  higher  static  priority  than  the
       tested  application.  This allows an emergency kill of tested real-time
       applications that do not block or terminate as expected.

       Since Linux 2.6.25, there are other techniques for dealing with runaway
       real-time  and  deadline  processes.   One  of  these  is  to  use  the
       RLIMIT_RTTIME resource limit to set a ceiling on the CPU  time  that  a
       real-time process may consume.  See getrlimit(2) for details.

       Since  version  2.6.25, Linux also provides two /proc files that can be
       used to reserve a certain amount of CPU time to be  used  by  non-real-
       time  processes.   Reserving  some CPU time in this fashion allows some
       CPU time to be allocated to (say) a root shell that can be used to kill
       a  runaway  process.   Both  of  these  files  specify  time  values in
       microseconds:

       /proc/sys/kernel/sched_rt_period_us
              This file specifies a scheduling period that  is  equivalent  to
              100%  CPU bandwidth.  The value in this file can range from 1 to
              INT_MAX, giving an operating range of 1 microsecond to around 35
              minutes.   The  default  value in this file is 1,000,000 (1 sec-
              ond).

       /proc/sys/kernel/sched_rt_runtime_us
              The value in this file specifies how much of the  "period"  time
              can be used by all real-time and deadline scheduled processes on
              the system.  The value  in  this  file  can  range  from  -1  to
              INT_MAX-1.   Specifying  -1  makes  the  runtime the same as the
              period; that is, no CPU time is set aside for non-real-time pro-
              cesses (which was the Linux behavior before kernel 2.6.25).  The
              default value in this file is 950,000  (0.95  seconds),  meaning
              that 5% of the CPU time is reserved for processes that don't run
              under a real-time or deadline scheduling policy.

   Response time
       A blocked high priority thread waiting for I/O has a  certain  response
       time  before  it  is  scheduled  again.   The  device driver writer can
       greatly reduce this response time by using a "slow interrupt" interrupt
       handler.

   Miscellaneous
       Child  processes  inherit the scheduling policy and parameters across a
       fork(2).  The scheduling policy and  parameters  are  preserved  across
       execve(2).

       Memory  locking is usually needed for real-time processes to avoid pag-
       ing delays; this can be done with mlock(2) or mlockall(2).

NOTES
       Originally, Standard Linux was intended as a general-purpose  operating
       system  being able to handle background processes, interactive applica-
       tions, and less demanding  real-time  applications  (applications  that
       need  to usually meet timing deadlines).  Although the Linux kernel 2.6
       allowed for kernel preemption and the newly introduced  O(1)  scheduler
       ensures  that  the  time  needed to schedule is fixed and deterministic
       irrespective of the number of active tasks,  true  real-time  computing
       was not possible up to kernel version 2.6.17.

   Real-time features in the mainline Linux kernel
       From kernel version 2.6.18 onward, however, Linux is gradually becoming
       equipped with real-time capabilities, most of which  are  derived  from
       the  former  realtime-preempt  patches developed by Ingo Molnar, Thomas
       Gleixner, Steven Rostedt, and others.  Until the patches have been com-
       pletely  merged into the mainline kernel (this is expected to be around
       kernel version 2.6.30), they must be  installed  to  achieve  the  best
       real-time performance.  These patches are named:

           patch-kernelversion-rtpatchversion

       and  can  be  downloaded  from  <http://www.kernel.org/pub/linux/kernel
       /projects/rt/>.

       Without the patches and prior to their full inclusion into the mainline
       kernel,  the  kernel  configuration  offers  only  the three preemption
       classes CONFIG_PREEMPT_NONE, CONFIG_PREEMPT_VOLUNTARY, and  CONFIG_PRE-
       EMPT_DESKTOP  which  respectively  provide  no,  some, and considerable
       reduction of the worst-case scheduling latency.

       With the patches applied or after their full inclusion into  the  main-
       line   kernel,  the  additional  configuration  item  CONFIG_PREEMPT_RT
       becomes available.  If this is selected, Linux is  transformed  into  a
       regular  real-time  operating system.  The FIFO and RR scheduling poli-
       cies are then used to run a thread with true real-time priority  and  a
       minimum worst-case scheduling latency.

SEE ALSO
       chrt(1), getpriority(2), mlock(2), mlockall(2), munlock(2),
       munlockall(2), nice(2), sched_get_priority_max(2),
       sched_get_priority_min(2), sched_getscheduler(2), sched_getaffinity(2),
       sched_getparam(2), sched_rr_get_interval(2), sched_setaffinity(2),
       sched_setscheduler(2), sched_setparam(2), sched_yield(2),
       setpriority(2), pthread_getaffinity_np(3), pthread_setaffinity_np(3),
       sched_getcpu(3), capabilities(7), cpuset(7)

       Programming  for  the  real  world  -  POSIX.4  by Bill O. Gallmeister,
       O'Reilly & Associates, Inc., ISBN 1-56592-074-0.

       The   Linux   kernel   source   files    Documentation/scheduler/sched-
       deadline.txt,               Documentation/scheduler/sched-rt-group.txt,
       Documentation/scheduler/sched-design-CFS.txt,                       and
       Documentation/scheduler/sched-nice-design.txt

COLOPHON
       This  page  is  part of release 3.69 of the Linux man-pages project.  A
       description of the project, information about reporting bugs,  and  the
       latest     version     of     this    page,    can    be    found    at
       http://www.kernel.org/doc/man-pages/.

Linux                             2014-05-13                          SCHED(7)
Want to link to this manual page? Use this URL:
<http://star2.abcm.com/cgi-bin/bsdi-man?query=sched&sektion=7&manpath=>
home | help