I/O Management and Disk Scheduling

11.1 - I/O Devices

I/O devices categories

Human readable - used to communicate with the user
- e.g. printers, display, keyboard etc.
Machine readable - used to communicate with electronic equipment
- e.g. disk drives, USB keys, controllers etc.
Communication - communicate with remote devicese
- e.g. modems, digital line drivers etc.

Key concepts

Data rate - orders of magnitude in terms of data transfer rates of devices

Application - the use to which a device is put has an influence on the software and policies in the OS and supporting utilities

File management software for a storage disk
Disk space for pages requires virtual memory scheme

Complexity of control - the complexity of the I/O modules that controls the different devices

e.g. Disk requires a more complex control interface than a printer

Unit of transfer - data may be transferred as a stream of bytes or characters (terminal I/O) or in larger blocks (disk I/O)

Data representation - different data encoding schemes are used by different devices

Difference in code and parity conventions

Error conditions - nature of errors, way they are reported, consequences, and the available responses are different in different devices

11.2 - Organization of the I/O Function

Three techniques of waiting for I/O

Programmed I/O - the processor issues an I/O command, on behalf of a process, to an I/O module; the process then busy waits for the operation to be completed before processing

Interrupt-drive I/O - the processor issues an I/O command on behalf of a process

If the I/O instruction from the process is nonblocking, then the processor continues to execute instructions from the process that issued the I/O command
If the instruction is blocking, then the next instruction that the processor executes is from the OS, which will put the current process in a blocked state and schedule another process

Direct memory access (DMA) - a DMA module controls the exchange of data between main memory and an I/O module

The processor sends a request for the transfer of a block of data to the DMA module and is interrupted only after the entire block has been transferred

Evolution of the I/O Function

The processor directly controls a peripheral device
A controller or I/O module is used; Processor uses programmed I/O without interrupts
Interrupts deployed
I/O module given direct control of memory via DMA
Enhanced I/O modules that acts like a separate processor (I/O Channel). It handles the instructions without processor intervention
I/O module has it's own memory (I/O Processor), allowing it to control a large set of I/O devices

Direct Memory Access

The DMA module mimicks the processor and take over the control of the system
The processor sends information to the DMA module when it wants to read a block of data
- Read or write (read/write control line)
- Address of the I/O device (data lines)
- Starting location in memory (data lines, stored in the DMA register)
- Number of words (data line, stored in data cuont register)
The DMA module then transfers the data without the involvement of the processor. It sends an interrupt to the processor after the transfer is done.

DMA Block

DMA Configurations

All module share the same bus
Use programmed I/O to exhange data between memory and an I/O module through the DMA module
Inexpensive but inefficient

Integrates DMA and some I/O functions using separate paths aside from the system bus

Use a dedicated I/O bus
One I/O interface in the DMA module
Expandable

11.3 - Operating System Design Issues

Design Objectives

Efficiency - I/O module is usually the bottleneck in the system

Generality - it's more desirable to handle all devices in a uniform matter

Local Structure of the I/O Function

A hierarchical structure

Layers that involved in communicating with a local peripheral device

Logical I/O - deals with the device as a logical resource and is not concerned with the details of actually controlling the device

Manages gneral I/O functions on half of user processes
Simple commands

Device I/O - The requested operations and data are convereted into appropriate sequences of I/O instructions, channel commands, and controller orders

Scheduling and control - queueing, scheduling, and control of I/O operation

Interrupts are handled in this layour
I/O status collected and reported

Layers for communication devices

Directory management - symbolic file names are converted to identifiers that either reference the file directly or indirectly through a file descriptor or index table

File system - deals with the logical structure of files and with the operations that can be specified by user (open/close/read/write)

Manages access rights

Physical organization - logical reference to files and records gets convereted to physical secondary storage address

Track and sector structure
Secondary storage space and main storage buffers

11.4 - I/O Buffering

To avoid blocking during transfer
To make sure the system can swap out the process
Perform input transfers in advance of requests being made and to perform output transfers some time after the request is made

Block-irented device - stores information in blocks that are usually of fixed size, and transfers are made one block at a time

e.g. disks and usb keys

Stream-oriented device - transfer data in and out as a stream of bytes, with no block structures

e.g. terminals, printers, communication ports etc.

Single Buffer

The OS assigns a buffer in the system portion of main memory to the operation

Reading ahead / anticipated input (block-oriented devices)

Input transfers are made to the system buffer
When transfer is complete, the process moves the block into user space and immediately requests another block
Based on the assumption that the block will eventually be used
For a stream-oriented I/O, the buffer can be used to hold a single line (line-at-a-time I/O)
- An alternative is byte-at-a-time I/O
The system can swap the process out since the input operation is taking place in system memory rather than process memory
Complicates the OS logic as it must keep track of the assignment of system buffers to user processes

Double Buffer

Assigning two system buffers to the operation
Refers to as double buffering or buffer swapping
A process can transfer data to (or from) one buffer while the operating system empties (or fills) the other
Ensures that the process will not have to wait on I/O
Increased complexity
line-at-a-time I/O
- User process won't be suspended for input or output, unless the process runs ahead of the double buffers
byte-at-a-time I/O
- No advantage over a single buffer with double the length

Circular Buffer

Improves performance of a particular process
Improves performance for processes with rapid bursts of I/O

11.5 - Disk Scheduling

Seek time - the time for the (head) to get to the correct track

Initial startup time and the time taken to traverse the tack that have to be crossed once the access arm is up to speed
Under 10ms on average

Rotational delay - the time it takes for the beginning of the sector to reach the head

2ms on average

Access time - the time to get into the position to start read/write; seek time + rotational delay

Transfer time - time required for the read/write process

$T=\frac{b}{rN}$

T = transfer time

b = number of bytes to be tranfers

N = number of bytes on a track

r = rotation speed (rps)

Total average access time

$T_a=T_s+\frac{1}{2r}+\frac{b}{rN}$

Disk Scheduling Policies

Random scheduling - select random I/O requests from the queue; used to benchmark other methods

First In First Out

Processes requests from the queue in sequential order
Fair
Performs well when many requests are to clustered file sectors or few processes requires access
Does not perform well when many processes are competing for the disk
Often approximate random scheduling in performance

Priority

Control of scheduling is outside the control of disk management software
Not intended to optimize disk utilization
Short batch jobs and interactive jobs are given higher priority than longer jobs
Long jobs may face starvation

Last in First Out

Taking advantage of locality in a transaction-processing systems
- Giving the device to the most recent user
Requests may face starvation in a busy system

Shortest Service Time First

Selects disk I/O requests that require the least movement of the disk arm from its current position
Minimize seek time
Random tie-breaking algorithm may be needed to resolve cases of equal distances (in the opposite direction)
Better performance than FIFO

Scan

The arm can only move in one direction until it reaches the last track or there are no more requests in that direction (LOOK policy)
Prevents starvation
Performs similarily to SSTF on average
Biased against the area most recently traversed (not utilizing locality as well as SSTF)

C-Scan

Restricts scan to one direction only
- Upon reaching the last track, the arm returns to the opposite end of the disk and the scan begins again
Reduces the maximum delay experienced by new requests

N-Step-SCAN and FSCAN

To prevent a process from monopolizing the arm by repeatly requesting a particular track
N-step-SCAN - segments the disk request queue into subqueues of length N.
- Subqueues are proceseed one at a time using SCAN
- Requests are added to other queues when the current queue is being processed
FSCAN - use two subqueues
- All of the requests are in one of the queues when a scan begins
- All new requests during the scan enters the other queue

11.6 - RAID

A set of physical disk drives viewed by the OS as a single logical drive
Data are distributed across the physical drives of an array in a scheme known as striping
Redundant disk capacity is used to store parity information, which guarantees data recoverability in case of a disk failure

RAID 0

User and system data are distributed across all of the disks in the array
Data is divided in strips and distributed across the disks in a round robin like way
stripe - a set of logically consecutive strips that maps exactly one strip to each array member
High transfer capacity (read/write)
High I/O request rate
No redundancy

RAID 1

Each logical strip is mapped to two separate physical disk (all data duplicated)
Read request can be served by either of the two disks, the one with lower seek time is selected
Write request updates both corresponding strips in parallel (dictated by the slower disk)
Easy data recovery
Always need twice the physical disk space of the logical disk
High I/O request rates for reads

RAID 2

Data striping used
Strips are very small
Error-correcting code calculated across corresponding bits on each data disk, bits of the code are stored in the corresponding bit positions on multiple parity disks
Costly (although less than RAID 1)
- Numbe rof parity disks is proportional to the log of the number of data on disks
Effective in environments where disk errors occur

RAID 3

Similar to RAID 2 but only one parity disk is required
Data can be recovered by accessing data on the parity disk and remaining disk in the case of a disk failure
- Using an XOR function on the bits
High data transfer rates
Only one I/O requests can be executed at a time

RAID 4

Uses independent access array to allow each disk to operate independently
- Independent I/O requests can be satisfied in parallel
- High I/O requests for read
Large data strips
Bit-by-bit parity strip stored on the parity disk
Write penalty when I/O request of small size is performed
- Need to update user data and corresponding parity bits

RAID 5

Similar to RAID 4
Parity strip is distributed on all disks in a round-robin scheme
Avoids I/O bottle neck of the single parity disk

RAID 6

Two different parity calculations are performed and stored in separate blocks on different disks
- Thus requires 2 parity disks
Very high data availbility
Can tolerate up to 2 disk failures
Good read performance
Significant write penalties compared to RAID 5

11.7 - Disk Cache

A buffer in main memory for disk sectors
- Similar to cache memory but for disk data

Data Delivery

How to bring data to the request process
Transfer the data directly
Pass a pointer to the process that references the desired data in the disk cache

Replacement Strategy

Least recently used (LRU) is the most common strategy used
Least frequently used (LFU) is another possibility
- Implemented using a counter
A technique known as frequency-based replacement is used to overcome locality problems of LFU
- Uses a stack
- When there is a cache hit, the referenced block is moved to the top of the stack
- If the block was already in the new section, its reference count is not incremented; else it increments by 1
- The block with the smallest reference count in the old section is chosen for replacement; the least recently used such block is chosen in the event of a tie

Only achieves small performance improvement over LRU
- Inadequate interval for blocks to build up their reference counts even if they are relatively frequently referenced
Can divide to three sections for improved performance (new, middle, old)

11.9 - Linux I/O

Disk Scheduling

Uses the Linux Elevator by default
- A variation on the LOOK algorithm
- Augmented by the deadline I/O scheduler and the anticipatory I/O scheduler

The Elevator Scheduler

Maintains a single queue for read/write requests
- Keeps the list of requests sorted by block number
If the request is to the same on-disk sector or an adjacent sector to a pending request in the queue, then the existing request and the new request are merged into one request
If a request in the queue is sufficiently old, the new request is inserted at the tail of the queue
If there is a suitable location,the new request is inserted in sorted order
If there is no suitable location, the new request is placed at the tail of the queue

Deadline Scheduler

Uses three queues
Each incoming request is placed in the sorted elevator queue
- Also placed at the tail the Read/Write FIFO queue depends on the type of the request
An expriation time for each requests
Normally the scheduler dispatches from the sorted queue
- The request is removed from all queues once satisfied
When the request at one of the head of the FIFO queues expires, the scheduler dispatches from that queue and processes the expired request along with the next few requests from the queue
Overcomes problems with starvation for distant blocks requests and the last of distinction between read and writes

Anticipatory I/O Scheduler

Original elevator and deadline scheduler performances poorly when there are numerous synchronous read requests
The anticipatory I/O scheduler is superimposed on the deadline scheduler
- Causes a delay of 6ms when a read request is dispatched to see if the application that issued the last read request will issue another read request to the same region of the disk (locality)
  - If such request exists it gets processed immediately

Linux Page Cache

Before Linux 2.2 the kernel maintained a page cache for reads and writes from regular file system and for virtual memory pages, and a seperate buffer cache for block I/O
After Linux 2.4 there is a unified page chache that is involved in all traffic between disk and main memory
A collection of dirty caches can be ordered properly and written out efficently onto disk
Due to temporal locality, pages in the page cache are likely to be referenced again before they are flushed from the cache, thus saving a disk I/O operations
Dirty pages are written back to disk in two situation
- When free memory falls below a specified threshold
- When dirty pages grow older than a specified threshold