	Mandatory File Locking For The Linux Operating System

		Andy Walker <andy@lysaker.kvaerner.no>

			   06 April 1996


What is  mandatory locking?
---------------------------

Mandatory locking is kernel enforced file locking, as opposed to the more usual
cooperative file locking used to guarantee sequential access to files among
processes. File locks are applied using the flock() and fcntl() system calls
(and the lockf() library routine which is a wrapper around fcntl().) It is
normally a process' responsibility to check for locks on a file it wishes to
update, before applying its own lock, updating the file and unlocking it again.
The most commonly used example of this (and in the case of sendmail, the most
troublesome) is access to a user's mailbox. The mail user agent and the mail
transfer agent must guard against updating the mailbox at the same time, and
prevent reading the mailbox while it is being updated.

In a perfect world all process would use and honour a cooperative, or
"advisory" locking scheme. However, the world isn't perfect, and there's
a lot of poorly written code out there.

In trying to address this problem, the designers of System V UNIX came up
with a "mandatory" locking scheme, whereby the operating system kernel would
block attempts by a process to write to a file that another process holds a
"read" -or- "shared" lock on, and block attempts to both read and write to a 
file that a process holds a "write " -or- "exclusive" lock on.

The System V mandatory locking scheme was intended to have as little impact as
possible on existing user code. The scheme is based on marking individual files
as candidates for mandatory locking, and using the existing fcntl()/lockf()
interface for applying locks just as if they were normal, advisory locks.

Note 1: In saying "file" in the paragraphs above I am actually not telling
the whole truth. System V locking is based on fcntl(). The granularity of
fcntl() is such that it allows the locking of byte ranges in files, in addition
to entire files, so the mandatory locking rules also have byte level
granularity.

Note 2: POSIX.1 does not specify any scheme for mandatory locking, despite
borrowing the fcntl() locking scheme from System V.

Marking a file for mandatory locking
------------------------------------

A file is marked as a candidate for mandatory by setting the group-id bit in
its file mode but removing the group-execute bit. This is an otherwise
meaningless combination, and was chosen by the System V implementors so as not
to break existing user programs.

Note that the group-id bit is usually automatically cleared by the kernel when
a setgid file is written to. This is a security measure. The kernel has been
modified to recognize the special case of a mandatory lock candidate and to
refrain from clearing this bit. Similarly the kernel has been modified not
to run mandatory lock candidates with setgid privileges.

Available implementations
-------------------------

I originally intended to base my implementation on SunOS, only to find out that
the implementation in SunOS 4.1.1 (the latest version that will work on my Sun
3/80 at home) is completely hopeless.

For one thing, calls to open() for a file fail with EAGAIN if another process
holds a mandatory lock on the file. However, processes already holding open
file descriptors can carry on using them. Wierd!

In addition, SunOS doesn't seem to honour the O_NONBLOCK (O_NDELAY) flag for
mandatory locks, so reads and writes to locked files always block when they
should probably return EAGAIN.

I found some test code online, which I think comes from one of Stevens' UNIX
programming books, that confirmed what I considered to be be the "obvious"
semantics, described below. I shall attempt to run my test programs on other
platforms to verify my implementation.

Personally I feel that this is such an esoteric area that these semantics are
just as valid as any others, so long as the main points seem to agree.

Semantics
---------

1. Mandatory locks can only be applied via the fcntl()/lockf() locking
   interface - in other words the System V/POSIX interface. BSD style
   locks using flock() never result in a mandatory lock.

2. If a process has locked a region of a file with a mandatory read lock, then
   other processes are permitted to read from that region. If any of these
   processes attempts to write to the region it will block until the lock is
   released, unless the process has opened the file opened with the O_NONBLOCK
   flag in which case the system call will return immediately with the error
   status EAGAIN.

3. If a process has locked a region of a file with a mandatory write lock, all
   attempts to read or write to that region block until the lock is released,
   unless a process has opened the file with the O_NONBLOCK flag in which case
   the system call will return immediately with the error status EAGAIN.

Which system calls are affected?
--------------------------------

Those which modify a file's contents, not just the inode. That gives read(),
write(), readv(), writev(), truncate() and ftruncate(). truncate() and
ftruncate() are considered to be "write" actions for the purposes of mandatory
locking.

The affected region is usually defined as stretching from the current position
for the total number of bytes read or written. For the truncate calls it is
defined as the bytes of a file removed or added (we must also consider bytes
added, as a lock can specify just "the whole file", rather than a specific
range of bytes.)

Note 3: I may have overlooked some system calls that need mandatory lock
checking in my eagerness to get this code out the door. Please let me know, or
better still fix the system calls yourself and submit a patch to me or Linus.

Warning!
--------

Not even root can override a mandatory lock, so runaway process can wreak
havoc if they lock crucial files. The way around it is to change the file
permissions (remove the setgid bit) before trying to read or wite to it.
Of course, that might be a bit tricky if the system is hung :-(
