Linux驱动开发之阻塞与非阻塞IO

简述

在Linux内核中，I/O操作可以是阻塞或非阻塞的。

阻塞I/O是指进程在进行I/O操作时会一直等待直到操作完成。在这种情况下，进程会被挂起，直到I/O操作完成。这种方式可以确保操作的完整性，但可能会导致进程长时间等待，从而影响应用程序的响应速度。

非阻塞I/O是指进程在进行I/O操作时不会一直等待，而是立即返回。如果I/O操作还没有完成，则进程可以继续执行其他操作。这种方式可以提高应用程序的响应速度，但可能会导致I/O操作不完整或错误。关系如下：

在Linux内核中，可以使用select、poll、epoll等系统调用来实现非阻塞I/O。这些系统调用可以监视多个文件描述符，并在其中任何一个变为“就绪”时通知进程。这样，进程可以在I/O操作完成之前继续进行其他操作，而不必一直等待。

阻塞I/O适用于需要确保操作完整性的情况，而非阻塞I/O适用于需要提高应用程序响应速度的情况。

应用如何设置

应用程序可以使用fcntl系统调用来设置文件描述符的属性，从而指定是阻塞还是非阻塞模式。

在阻塞模式下，应用程序可以使用如下代码将文件描述符设置为阻塞模式：

int flags = fcntl(fd, F_GETFL, 0);
fcntl(fd, F_SETFL, flags & ~O_NONBLOCK);

在非阻塞模式下，应用程序可以使用如下代码将文件描述符设置为非阻塞模式：

int flags = fcntl(fd, F_GETFL, 0);
fcntl(fd, F_SETFL, flags | O_NONBLOCK);

等待队列

在 Linux 内核中，等待队列是一种机制，用于在进程等待某些事件发生时进行同步。

当一个进程等待某些条件时，它会调用一个等待函数，例如 wait_event 或 wait_event_interruptible。这个函数会将进程添加到等待队列中，并将进程挂起，直到条件满足。当条件满足时，等待队列中的进程将被唤醒，并继续执行。

等待队列还可以使用信号量或读写信号量进行同步。当一个进程等待一个信号量时，它会将自己添加到等待队列中，并在信号量可用时被唤醒。类似地，当一个进程等待一个读写信号量时，它会将自己添加到等待队列中，并在读写信号量可用时被唤醒。

总之，等待队列是 Linux 内核中实现同步的重要机制之一。它们可以帮助进程等待某些条件，并在条件满足时进行同步。在编写内核代码时，理解等待队列的概念和用法非常重要。

等待队列在 Linux 内核中是通过 wait.h 头文件来定义和实现的。这个头文件包含了许多与等待队列相关的数据结构、函数和宏定义。

以下是一些 wait.h 头文件中常用的数据结构：

wait_queue_head_t：等待队列头结构体，用于管理等待队列中的等待进程。
wait_queue_t：等待队列项结构体，用于表示等待队列中的一个等待进程。
wait_bit_queue_t：位等待队列项结构体，用于表示等待队列中的一个等待进程，它与位图相关。

以下是一些 wait.h 头文件中常用的函数：

init_waitqueue_head(wait_queue_head_t *)：初始化等待队列头。
DECLARE_WAITQUEUE(name, current): 定义一个等待队列。
add_wait_queue(wait_queue_head_t *, wait_queue_t *)：将一个等待队列添加到等待队列头。
remove_wait_queue(wait_queue_head_t *, wait_queue_t *)：将等待队列从等待队列头中移除。
wake_up(wait_queue_head_t *)
wake_up_nr(wait_queue_head_t *, unsigned int mode)
wake_up_all(wait_queue_head_t *)
wake_up_locked(wait_queue_head_t *)
wake_up_all_locked(wait_queue_head_t *)
wake_up_interruptible(wait_queue_head_t *)
wake_up_interruptible_nr(wait_queue_head_t *, unsigned int mode)
wake_up_interruptible_all(wait_queue_head_t *)
wake_up_interruptible_sync(wait_queue_head_t *)
wait_event(wait_queue_head_t *, unsigned condition)：等待条件(第二个参数)发生。
wait_event_timeout(wait_queue_head_t *, unsigned condition)：等待条件发生，但是在超时时间内没有发生时返回。
wait_event_interruptible(wait_queue_head_t *, unsigned condition)：等待条件发生，但是可以被信号中断。
wait_event_interruptible_timeout(wait_queue_head_t *, unsigned condition)：等待条件发生，但是可以被信号中断，并且在超时时间内没有发生时返回。

除了上述函数之外，wait.h 头文件还定义了许多与等待队列相关的宏定义，例如 wait_event_interruptible_exclusive()、wait_event_interruptible_exclusive_timeout() 等。

阻塞实现

定义和初始化：

wait_queue_head_t g_queue;
init_waitqueue_head(&g_queue);
unsigned int g_key = 0;   /* 按键 0：没有 1：按键按下 */

进程唤醒

wake_up_interruptible(&g_queue);

进程挂起

使用等待队列+schedule()

    if(filp->f_flags & O_NONBLOCK) {
        if(g_key == 0)
            ret = -EAGAIN;
        goto out;
    } else {
        DECLARE_WAITQUEUE(wait, current);   /* 定义一个等待队列 */
        if(g_key == 0) {
            add_wait_queue(&g_queue, &wait);    /* 添加到等待队列头 */
            __set_current_state(TASK_INTERRUPTIBLE);/* 设置任务状态 */
            schedule();                             /* 挂起进程 */
            if(signal_pending(current)) {           /* 判断是否为信号引起的唤醒 */
                ret = -ERESTARTSYS;
                goto wait_error;
            }
            set_current_state(TASK_RUNNING);        /*设置为运行状态 */
            remove_wait_queue(&g_queue, &wait); /*将等待队列移除 */
        }
    }

完全使用等待队列

    if(filp->f_flags & O_NONBLOCK) {
        if(g_key == 0)
            ret = -EAGAIN;
        goto out;
    } else {
        ret = wait_event_interruptible(g_queue, g_key);
    }

wait_event_interruptible系列函数里面就是把schedule()函数封装了一层，所以我们在使用时可以直接使用wait_event_interruptible系列函数。

非阻塞

设备驱动中的轮询编程

设备驱动中poll（）函数的原型是：

unsigned int(*poll)(struct file * filp, struct poll_table* wait);

第1个参数为file结构体指针，第2个参数为轮询表指针。这个函数应该进行两项工作。

对可能引起设备文件状态变化的等待队列调用poll_wait（）函数，将对应的等待队列头部添加到poll_table中。
返回表示是否能对设备进行无阻塞读、写访问的掩码。

设备驱动中poll（）函数的典型模板，如下所示

static unsigned int xxx_poll(struct file *filp, poll_table *wait)
{
 unsigned int mask = 0;
 struct xxx_dev *dev = filp->private_data;      /* 获得设备结构体指针*/

 ...
 poll_wait(filp, &dev->r_wait, wait);           /* 加入读等待队列 */
 poll_wait(filp, &dev->w_wait, wait);           /* 加入写等待队列 */

 if (...)                                       /* 可读 */
     mask |= POLLIN | POLLRDNORM;               /* 标示数据可获得（对用户可读）*/

 if (...)                                       /* 可写 */
     mask |= POLLOUT | POLLWRNORM;              /* 标示数据可写入*/
 ...
 return mask;
}

应用程序的轮询编程

应用程序中最广泛用到的是BSD UNIX中引入的select（）系统调用，其原型为：

int select(int numfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout);

poll（）的功能和实现原理与select（）相似，其函数原型为：

int poll(struct pollfd *fds, nfds_t nfds, int timeout);

当多路复用的文件数量庞大、I/O流量频繁的时候，一般不太适合使用select（）和poll（），此种情况下，select（）和poll（）的性能表现较差，我们宜使用epoll。epoll的最大好处是不会随着fd的数目增长而降低效率，select（）则会随着fd的数量增大性能下降明显。

与epoll相关的用户空间编程接口包括：

struct epoll_event {
  __uint32_t events;  /* Epoll events */
  epoll_data_t data;  /* User data variable */
};
int epoll_create(int size);
int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);
int epoll_wait(int epfd, struct epoll_event * events, int maxevents, int timeout);

位于https://www.kernel.org/doc/ols/2004/ols2004v1-pages-215-226.pdf的文档《Comparing and Evaluating epoll，select，and poll Event Mechanisms》对比了select、epoll、poll之间的一些性能。一般来说，当涉及的fd数量较少的时候，使用select是合适的；如果涉及的fd很多，如在大规模并发的服务器中侦听许多socket的时候，则不太适合选用select，而适合选用epoll。

上篇Linux驱动开发之并发控制

下篇Linux驱动开发之异步通知