Sync vs fsync — Making Sure Your Writes Actually Hit the Disk

dos2unix

Well-Known Member
Joined
May 3, 2019
Messages
4,410
Reaction score
4,606
Credits
41,651
Most of us have been burned by this at least once. You copy a file to a USB drive, the prompt comes back, you pull the drive, and the file is either corrupt or missing entirely. The frustrating part is that everything looked fine. No errors, no warnings. Linux just lied to you — or more accurately, you misunderstood what Linux was telling you.

This article explains what is really happening under the hood, why some tools are more reliable than others, and how to make absolutely sure your data is physically on the media before you remove it.

THE WRITE PIPELINE

When you write a file in Linux, your data does not go directly to the disk. It goes through a layered pipeline:

Your application writes to the kernel page cache (RAM)
The kernel eventually flushes the page cache to the block layer
The block layer submits I/O requests to the storage driver
The drive's internal write cache receives the data
The drive commits the data to its storage media

When your prompt returns after a cp command, you are typically only guaranteed that step 1 has completed. Everything below that is happening asynchronously in the background. The kernel is perfectly happy to let you keep working — or pull the drive — while steps 2 through 5 are still in progress.

This is by design. This write-back caching is one of the main reasons Linux I/O feels fast. The tradeoff is that you have to explicitly ask for a guarantee when you need one.

SYNC VS FSYNC — NOT THE SAME THING

Most people reach for the sync command when they want to flush buffers before removing a drive. It works, sometimes. But it is important to understand what sync actually does and where it falls short.

The sync command issues a global flush request to the kernel telling it to write all dirty pages to their respective block devices. The problem is that historically, sync submitted those write requests and then returned immediately without waiting for confirmation that the writes completed. It was fire-and-forget at the kernel level.

This behavior changed in kernel 5.8, where sync was updated to actually wait for writes to complete before returning. But because behavior varied across kernel versions and distributions, sync developed a reputation for being unreliable — and that reputation is at least partially deserved depending on what kernel you are running.

The fsync() system call is a different beast entirely. When a program calls fsync() on a file descriptor, the kernel blocks until the storage device confirms that the data is physically written. Not cached in RAM, not queued in the block layer, not sitting in the drive's internal buffer — actually on the media. Only then does fsync() return.

This is the difference between "I submitted your request" and "your request is done."

CP DOES NOT FSYNC

The standard cp command writes data to the page cache and exits. It does not call fsync(). This means the moment your prompt returns, your data may exist only in RAM. The kernel will flush it eventually — sometimes in seconds, sometimes in minutes depending on dirty page writeback timers — but you have no guarantee.

You can observe this yourself. Copy a large file to a USB drive with cp, note how fast the prompt returns, and then watch iostat or check /proc/meminfo for dirty page count. The drive is still writing.

If you want cp to behave more reliably, you have a few options.

OPTION 1 — DD WITH CONV=FSYNC

dd is the tool most of us already know for writing ISO images to USB drives, and the conv=fsync option is the reason it is reliable for that purpose. After writing all the data, dd calls fsync() on the output file descriptor before exiting. The prompt does not return until the drive has physically confirmed the write.

Code:
dd if=sourcefile of=/dev/sdX bs=4M conv=fsync status=progress

This is not just for ISO images. You can use dd with conv=fsync to copy any single file and get a guaranteed flush on exit. It is not the most elegant tool for general file copying, but it gets the job done.

OPTION 2 — RSYNC WITH --FSYNC

If you are copying files rather than writing directly to a block device, rsync with the --fsync flag is the cleanest solution. It calls fsync() on each file after writing it, giving you the same guarantee as dd conv=fsync but with rsync's full feature set including progress reporting, checksumming, and partial transfer recovery.

Code:
rsync --fsync sourcefile /path/to/destination/

This option requires rsync 3.2.3 or newer. Check your version with rsync --version before relying on it.

OPTION 3 — SYNC WITH A SPECIFIC FILESYSTEM

If you are set on using cp and just want to flush afterward, the sync command has a --file-system option that is more targeted and more reliable than a bare sync call:

Code:
cp bigfile /mnt/usb/ && sync --file-system /mnt/usb/bigfile

The --file-system flag causes sync to call syncfs() on the filesystem containing that file, rather than issuing a global flush. On modern kernels this does wait for completion. It is more precise than a global sync and less likely to return before the writes are actually done.


This requires GNU coreutils 8.24 or newer.

OPTION 4 — EJECT INSTEAD OF UMOUNT

If you are working with removable media mounted via the mount command, use eject rather than umount when you are done. eject does several things that umount alone does not: it flushes all pending writes, unmounts the filesystem, and then sends the hardware eject command to the device. The drive will not pop out until the kernel is satisfied that all data is on the media.

Code:
eject /dev/sdX
or by mount point:

Code:
eject /mnt/usb
If the device is not physically ejectable, eject will still handle the flush and unmount cleanly. You can also use:

Code:
udisksctl power-off --block-device /dev/sdX

This is the udisks2 equivalent, commonly used in desktop environments, and it also ensures a clean flush before powering off the device.


Here is a quick reference for the tools and when each one gives you a real guarantee:

cp (bare)Writes to page cache. Prompt returns immediately. No guarantee data is on media.

sync (bare, old kernels)Submits flush requests but may return before writes complete. Unreliable.

sync (kernel 5.8+)Waits for write completion. More reliable but still a global operation.

sync --file-system /path/to/fileFlushes just the relevant filesystem and waits. Better choice than bare sync.

dd if=source of=dest bs=4M conv=fsyncCalls fsync() before exiting. Reliable guarantee.

rsync --fsync source destCalls fsync() per file. Clean, scriptable, reliable. Requires rsync 3.2.3+.

eject /dev/sdXFlush, unmount, and hardware eject in one step. Best practice for removable media.

udisksctl power-off --block-device /dev/sdXDesktop-friendly equivalent to eject with the same guarantees.


The kernel is not trying to deceive you. Write-back caching is a feature, and a good one. But when you are working with removable media or any situation where the data absolutely has to survive — use a tool that calls fsync(), or explicitly eject the device through the proper channel. A prompt returning to you is not a promise. An fsync() is.
 


Thanks for this treatment of the various options. I had though that plain old "sync" was sufficient and that umount implicitly took car of such things - why else would there even -be- an umount command? Your explanation makes things much clearer.

I do note that on, shut down, I see kernel a message on screen indicating "syncing all filesystems" right before "terminating all processes".
 
thank you for this bit of information. now i wish i had a straight-out fsync() terminal command...

maybe it does exist. or i could write a brief c program that wraps it. i guess not.

the other day i was trying. to copy a 350mib file into lexar usb 2 very slow media. on a computer admittedly junk, to be used primarily for web browsing. it had spirallinux with cinnamon desktop. so used nemo file manager. it stayed for over 30 minutes with nemo copy dialog box. a short time before. i tried with a different pen drive. cared to finish in about 10 minutes. but over half hour and makes no change whatsoever to the copy dialog? what the hey?

since it was a 64gb drive i was trying to copy to. i didn't want to unplug it. but i lost patience. on my "main" elderly computer. i checked that disk with gnome disk utility. it reported everything ok with check disk. but i have found garbled data before. especially on these disks which suck copying files into it larger than 20mib. i don't understand why.

there are some pen drives. that force the system to be more honest. with file manager copy progress. but i always get angry. when the progress goes "very fast" up until about 700mib. then it slows down gradually. i have become a slave to executing sync on a terminal.

sometimes when i copy a large amount of memory around. on gnome or other desktop based on it. and then i shut down. it refuses to show me any text. nothing from "systemd" or anything else. that is irritating. but usually by then all things found their destinations without worry. all pen drives were disconnected.

at other times. i do get the "red star dance" for several minutes from "systemd". related to a heavy copy operation.
 
The fsync() system call is a different beast entirely. When a program calls fsync() on a file descriptor, the kernel blocks until the storage device confirms that the data is physically written. Not cached in RAM, not queued in the block layer, not sitting in the drive's internal buffer — actually on the media. Only then does fsync() return.

fsync() in a terminal <ENTER> shows '>', expecting more input. I don't know what it wants.

So I just used plain old sync. Thoughts?

EDIT: I don't have fsync installed on Debian 13.3, and neither is it the official repos.

sudo apt install fsync returns "Unable to locate package fsync".
 
Last edited:
Copy a large file to a USB drive with cp
You do that and you're in for the waiting of your life bc USB sticks are generally slow (even v3.2 ones) and cp makes everything slower. Add to that a slow generic filesystem such as ext4 as either recipient or sender and you can go take a shower and forget about the file bc it won't be finished when you get out of the bath. You could say I've "been there, done that" so many times in different variants that I know them all.
On top of that cp is an acient command that doesn't seem to use all available CPU threads which makes things even slower.

The only reliable and fast way to copy a large file to a USB drive (regardless of whether it's a USB stick or USB SSD, or another regular storage (SATa/M.2)) is rsync and both recipient and sender are XFS. And by "large file" I mean anything above 15 GiB. The latest record of rsync for transferring 600 GiB (a favorite TV show of mine stored inside a 7z file for easier transfer) from WD RED NAS SSD to WD RED NAS HDD (both XFS) is 10 minutes and it maintained this speed (a little more than 1.05 GiB/s) during the whole time of the transfer.
 
fsync() in a terminal <ENTER> shows '>', expecting more input. I don't know what it wants.

So I just used plain old sync. Thoughts?

EDIT: I don't have fsync installed on Debian 13.3, and neither is it the official repos.

sudo apt install fsync returns "Unable to locate package fsync".
The fsync functionality, described by @dos2unix in post #1 is used as a system call function, not from a simple command on the command line, which you discovered.

fsync() is a function which can be used in a system call when a command line program like dd uses an option like "conv=fsync". That option basically tells the system call to the kernel, to use the fsync function to flush the data to disk.

The following is an example of one way fsync() works.

First create a file name file2 with the contents consisting of one word: hello.
Code:
[~]$ ls
file2

[~]$ cat file2
hello

Now we use dd to copy the file without any options just to check that it copies the file:
Code:
[~]$ dd if=file2 of=file2copy
0+1 records in
0+1 records out
6 bytes copied, 8.7098e-05 s, 68.9 kB/s

Then we check that the new file, called: file2copy, exists and check its contents:
Code:
[~]$ ls
file2  file2copy

[~]$ cat file2
hello

[~]$ cat file2copy
hello

So both files exist and file2 has been successfully copied into file2copy by dd from the command line.

To see what has occurred in the processing to create that second file, or specifically, to trace all the system calls involved in the creation of the second file using the dd command, one can use the strace program. One creates a file that records all the system calls involved. In the following, the file that records the system calls is called: tracefile, and the following command will create the tracing file whilst running the dd command that was used above but copying to a new file called: file2traced_copy:
Code:
[~]$ strace -otracefile dd if=file2 of=file2traced_copy
0+1 records in
0+1 records out
6 bytes copied, 0.000457988 s, 13.1 kB/s

In this invocation of dd, the output file containing the same contents as file2, file2traced_copy, one expects this new file to still hold the same contents as file2, as is shown here:
Code:
[~]$ cat file2traced_copy
hello

Now one can open the file: tracefile, to inspect all the system calls, but I won't show it here because it's long, and we are only interested in whether it called the fsync() function, in order to flush the data to disk. To check whether the fsync() system call was made, one can run the following on the tracefile:
Code:
[~]$ grep fsync tracefile

The fsync() function did not appear in the tracefile and so was not used in a system call.

Now, if one runs the same dd command, but with the option "conf=fsync", which is designed to call the fsync() function to do its work, a different result will appear:
Code:
[~]$ strace -otracefile2 dd if=file2 of=file2traced_copy_with_fsync conv=fsync
0+1 records in
0+1 records out
6 bytes copied, 0.00473791 s, 1.3 kB/s

A new file, named: file2traced_copy_with_fsync, containing the same contents as the original file2, is created by dd, which can be checked by inspecting the contents of that file. Checking for whether the fsync() has been successfully called by the new dd command, and executed by the kernel, can be checked in the new tracefile named tracefile2 which traced that dd command with the fsync option:
Code:
[~]$ grep fsync tracefile2
execve("/usr/bin/dd", ["dd", "if=file2", "of=file2traced_copy_with_fsync", "conv=fsync"], 0x7ffe2913dfb0 /* 29 vars */) = 0
openat(AT_FDCWD, "file2traced_copy_with_fsync", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
fsync(1)                                = 0
The result shows that fsync() was used when called by dd.

The upshot is that fsync() is called by programs but is itself not a simple command line program. Info on the function is in the manpage: man fsync.
 
Last edited:
The fsync functionality, described by @dos2unix in post #1 is used as a system call function, not from a simple command on the command line, which you discovered.

Yes, I wish that wasn't the case, but for now, it has to be called with an application that calls a system function.
(This can be done is a python script).
 
I wrote this a few months back.

fsync_tree.py
Code:
#!/usr/bin/env python3
"""
fsync_tree.py - Force fsync on all files in a directory tree (or a single file)
after a cp operation completes.
Usage:
    python3 fsync_tree.py /destination/path
"""

import os
import sys
import time

def fsync_path(target: str):
    synced = 0
    errors = 0

    if os.path.isfile(target):
        # Single file mode
        paths = [target]
    elif os.path.isdir(target):
        # Walk the entire tree
        paths = []
        for dirpath, dirnames, filenames in os.walk(target):
            # Also fsync the directories themselves
            paths.append(dirpath)
            for filename in filenames:
                paths.append(os.path.join(dirpath, filename))
    else:
        print(f"ERROR: '{target}' is not a file or directory.", file=sys.stderr)
        sys.exit(1)

    for path in paths:
        try:
            # Open with O_RDONLY; works for both files and dirs
            fd = os.open(path, os.O_RDONLY)
            try:
                os.fsync(fd)
                print(f"  synced: {path}")
                synced += 1
            finally:
                os.close(fd)
        except OSError as e:
            print(f"  ERROR syncing {path}: {e}", file=sys.stderr)
            errors += 1

    return synced, errors


def main():
    if len(sys.argv) != 2:
        print(f"Usage: {sys.argv[0]} <file_or_directory>", file=sys.stderr)
        sys.exit(1)

    target = sys.argv[1]
    print(f"Starting fsync on: {target}")
    start = time.monotonic()

    synced, errors = fsync_path(target)

    elapsed = time.monotonic() - start
    print(f"\nDone. {synced} path(s) synced, {errors} error(s) in {elapsed:.2f}s")
    sys.exit(1 if errors else 0)


if __name__ == "__main__":
    main()

How to use this with "cp".

cp -a /source/tree /destination/tree && python3 fsync_tree.py /destination/tree
 


Follow Linux.org

Staff online

Members online


Latest posts

Top