Splice

A file sits on disk. Your application reads it and sends it over the network. Simple enough—but behind this mundane operation hides one of computing’s most persistent performance bottlenecks. In a traditional I/O path, that single file traverses through four distinct memory copies before reaching the network interface. The kernel reads data from disk into a kernel buffer via DMA. The read() system call copies it to user space. The write() system call copies it back to a kernel socket buffer. Finally, DMA transfers it to the NIC. Each copy consumes CPU cycles, memory bandwidth, and cache space. ...