Support for fsync

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Support for fsync

Roger Leigh
Hi folks,

While doing some profiling and performance testing for our libtiff-using
application, I realised that we couldn't accurately profile the write
times because it never issues a full flush of the data to disc when you
call TIFFClose.  Unless you use TIFFFdOpen, you have no way to get at
the open fd to issue the fsync.  And doing this cross-platform means
doing all that for Windows as well.  This meant we could have several
gigabytes of pending buffered IO which never contribute to the execution
time.

This also has important implications for data integrity; it would be
nice to have the option to checkpoint the data.  While TIFFlush[Data]()
looks like it should fit the bill, and does write out the data, it
*doesn't* (and likely shouldn't) also do an fsync/fdatasync to ensure
that the data reached stable storage for performance reasons.  TIFFFlush
is more like fflush(3) than fsync(2), in that it submits the IO
requests, but it doesn't make any guarantees that they actually completed.

Would it be worth considering the addition of a new function e.g.
TIFFSync[Data] which would simply call fsync(2) or its Windows
equivalent on the open file?  It could potentially also call TIFFlush
internally to be sure it's committing a consistent state.  This would
mean I can issue TIFFSync() prior to TIFFClose, or TIFFSyncData as I
write out each IFD, and be sure the the data writes completed when the
call returns (or failed).

Any other thoughts?

Roger
_______________________________________________
Tiff mailing list: [hidden email]
http://lists.maptools.org/mailman/listinfo/tiff
http://www.remotesensing.org/libtiff/
Reply | Threaded
Open this post in threaded view
|

Re: Support for fsync

Bob Friesenhahn
On Mon, 22 May 2017, Roger Leigh wrote:
>
> Would it be worth considering the addition of a new function e.g.
> TIFFSync[Data] which would simply call fsync(2) or its Windows
> equivalent on the open file?  It could potentially also call TIFFlush
> internally to be sure it's committing a consistent state.  This would
> mean I can issue TIFFSync() prior to TIFFClose, or TIFFSyncData as I
> write out each IFD, and be sure the the data writes completed when the
> call returns (or failed).

GraphicsMagick provides an option (MAGICK_IO_FSYNC=TRUE environment
variable) to fsync after a file is written.  I don't know if anyone
uses it.

I think that your proposal worth considering and implementing.  All
that is needed is a source patch, with associated documentation.

Unless the parent directory is also synchronized, there is a
possibility of losing a whole new file, even if fsync has been called
on the file.  For example:

#include <sys/types.h>
#include <dirent.h>

// Synchronize a specified directory path to underlying store
int sync_directory(const char *dirpath)
{
     int status = 0;
     DIR * dir = opendir(dirpath);
     if ( dir != NULL )
     {
         int dfd = dirfd(dir);
         if ( dfd >= 0 )
         {
             // Synchronize directory entry of the given file
             status = fsync(dfd);
         }
         else
         {
             status = -1;
         }
         closedir(dir);
     }
     else
     {
         status = -1;
     }
     return status;
}

Note that there is also fdopendir() which opens the directory of a
open file descriptor fd, which libtiff may already have.  This may be
easier than using opendir().

Bob
--
Bob Friesenhahn
[hidden email], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
_______________________________________________
Tiff mailing list: [hidden email]
http://lists.maptools.org/mailman/listinfo/tiff
http://www.remotesensing.org/libtiff/