Quantcast

Qs about support for more than 2^16 IFDs and writing performance

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Qs about support for more than 2^16 IFDs and writing performance

Dinesh Iyer
Hi everyone,
I am developing an application that uses libTIFF for reading/writing TIFF files and I have the following questions about libTIFF. I apologize in advance if these questions have been asked but I could not find answers to them in the archive.

1. I noticed that libTIFF does not support reading images from TIFF files as IFD's greater than 65535. The TIFF specification does not appear to impose such a limit but libTIFF does. I have a few clients who have 100,000 image TIFF stacks that they would like to be able to read. 

2. I noticed that libTIFF does appear to support writing > 65536 images into a TIFF file. Is this the expected behaviour or an over-sight?

3. Lastly, when i tested writing a large number of images into a TIFF file, I found that the performance was very slow. It took ~2.5 hrs to write 66200 images, each of size just 64x64. I came across this github post: https://github.com/escabe/libtiff/commit/58b4c3ba4478987ecfe1e793b9d925e59eecfa36

about why the performance is poor and a possible fix for this. Is this patch recommended? 

Are there plans to limit the 65536 IFD restriction for TIFF files and also improve the writing performance anytime soon? Or is the recommendation to patch my local copy of the TIFF library?

Regards,
Dinesh

_______________________________________________
Tiff mailing list: [hidden email]
http://lists.maptools.org/mailman/listinfo/tiff
http://www.remotesensing.org/libtiff/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Qs about support for more than 2^16 IFDs and writing performance

Paavo Helde

Hi, I can comment from the viewpoint of just another user of libtiff. I have seen the slowdown myself (with ca 10,000 IFD-s) and profiled the cause to be the TIFFLinkDirectory() function which traverses the whole file again and again when adding new IFD-s.

Some kind of caching is needed for avoiding this traversal, but not sure if the approach demonstrated by this patch is the best one. It also contains modification of TIF (e.g. tif->tif_dirnumber++;) in a function called "Check" which is smelly.

The patch does not address the 65535 IFD limit problem (for solving that it looks like one needs to add a bunch of new libtiff functions for directories as the existing ones are defined in terms of uint16).

Cheers
Paavo


On 6.02.2017 22:43, Dinesh Iyer wrote:
Hi everyone,
I am developing an application that uses libTIFF for reading/writing TIFF files and I have the following questions about libTIFF. I apologize in advance if these questions have been asked but I could not find answers to them in the archive.

1. I noticed that libTIFF does not support reading images from TIFF files as IFD's greater than 65535. The TIFF specification does not appear to impose such a limit but libTIFF does. I have a few clients who have 100,000 image TIFF stacks that they would like to be able to read. 

2. I noticed that libTIFF does appear to support writing > 65536 images into a TIFF file. Is this the expected behaviour or an over-sight?

3. Lastly, when i tested writing a large number of images into a TIFF file, I found that the performance was very slow. It took ~2.5 hrs to write 66200 images, each of size just 64x64. I came across this github post: https://github.com/escabe/libtiff/commit/58b4c3ba4478987ecfe1e793b9d925e59eecfa36

about why the performance is poor and a possible fix for this. Is this patch recommended? 

Are there plans to limit the 65536 IFD restriction for TIFF files and also improve the writing performance anytime soon? Or is the recommendation to patch my local copy of the TIFF library?

Regards,
Dinesh


_______________________________________________
Tiff mailing list: [hidden email]
http://lists.maptools.org/mailman/listinfo/tiff
http://www.remotesensing.org/libtiff/


_______________________________________________
Tiff mailing list: [hidden email]
http://lists.maptools.org/mailman/listinfo/tiff
http://www.remotesensing.org/libtiff/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Qs about support for more than 2^16 IFDs and writing performance

Bob Friesenhahn
In reply to this post by Dinesh Iyer
On Mon, 6 Feb 2017, Dinesh Iyer wrote:

> Hi everyone,
> I am developing an application that uses libTIFF for reading/writing TIFF
> files and I have the following questions about libTIFF. I apologize in
> advance if these questions have been asked but I could not find answers to
> them in the archive.
>
> 1. I noticed that libTIFF does not support reading images from TIFF files
> as IFD's greater than 65535. The TIFF specification does not appear to
> impose such a limit but libTIFF does. I have a few clients who have 100,000
> image TIFF stacks that they would like to be able to read.
>
> 2. I noticed that libTIFF does appear to support writing > 65536 images
> into a TIFF file. Is this the expected behaviour or an over-sight?

It is likely that the arbitrary limit is intended to foil looping IDFs
(denial of service opportunity) while also limiting the performance
and memory impact of the check. The linear scan is perhaps not the
fastest algorithm but it should not take terribly long (much less than
a second) to scan 65535 entries.

> 3. Lastly, when i tested writing a large number of images into a TIFF file,
> I found that the performance was very slow. It took ~2.5 hrs to write 66200
> images, each of size just 64x64. I came across this github post:
> https://github.com/escabe/libtiff/commit/58b4c3ba4478987ecfe1e793b9d925e59eecfa36
>
> about why the performance is poor and a possible fix for this. Is this
> patch recommended?

It looks like this patch also caches the IFD offset lists in the
writer but also improves lookup time for the case where the offset
searched for is the one just stored.

Perhaps an algorithm like a Bloom filter
(https://en.wikipedia.org/wiki/Bloom_filter) could be used to improve
efficiency, but I am not sure of how much code is required for an
implementation.  The Bloom filter can have false positives so an
exhaustive scan would sometimes be needed to eliminate false
positives.

> Are there plans to limit the 65536 IFD restriction for TIFF files and also
> improve the writing performance anytime soon? Or is the recommendation to
> patch my local copy of the TIFF library?

The 65536 IFD restriction is likely arbitrary.

The most important thing is that have you profiled the code for your
slow test case to see why it is so slow to write?  Are you able to
post simple portable code for your test case so that others can
observe the same issue that you are?

It is quite possible that the writing bottleneck is something other
than this part of the code.

Bob
--
Bob Friesenhahn
[hidden email], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
_______________________________________________
Tiff mailing list: [hidden email]
http://lists.maptools.org/mailman/listinfo/tiff
http://www.remotesensing.org/libtiff/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Qs about support for more than 2^16 IFDs and writing performance

Paavo Helde

On 7.02.2017 2:04, Bob Friesenhahn wrote:
>   The linear scan is perhaps not the
> fastest algorithm but it should not take terribly long (much less than
> a second) to scan 65535 entries.e e

And if you multiply a second by 60,000, then you get many hours. Writing
a TIFF file with 60,000 IFD-s will call TIFFLinkDirectory() 60,000 times.

Note that such files typically do not fit in L3 cache and thus
traversing them may be extremely slow. We are talking mostly of BigTiff
format here.

Cheers
Paavo



_______________________________________________
Tiff mailing list: [hidden email]
http://lists.maptools.org/mailman/listinfo/tiff
http://www.remotesensing.org/libtiff/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Qs about support for more than 2^16 IFDs and writing performance

AG-9
In reply to this post by Bob Friesenhahn

> On Feb 6, 2017, at 4:04 PM, Bob Friesenhahn <[hidden email]> wrote:


> The linear scan is perhaps not the
> fastest algorithm but it should not take terribly long (much less than
> a second) to scan 65535 entries.
>

If you need to extract a single image the linear scan is not a major problem.  If however you need to extract more images, your linear scan becomes O(N^2) which in my experience becomes prohibitively expensive for large stacks.

It would also be good if the 65535 limit is removed but I have not yet received any customer complaints about that.

A.G.

_______________________________________________
Tiff mailing list: [hidden email]
http://lists.maptools.org/mailman/listinfo/tiff
http://www.remotesensing.org/libtiff/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Qs about support for more than 2^16 IFDs and writing performance

Bob Friesenhahn
In reply to this post by Paavo Helde
On Tue, 7 Feb 2017, Paavo Helde wrote:

>
> On 7.02.2017 2:04, Bob Friesenhahn wrote:
>>   The linear scan is perhaps not the
>> fastest algorithm but it should not take terribly long (much less than
>> a second) to scan 65535 entries.e e
>
> And if you multiply a second by 60,000, then you get many hours. Writing
> a TIFF file with 60,000 IFD-s will call TIFFLinkDirectory() 60,000 times.
>
> Note that such files typically do not fit in L3 cache and thus
> traversing them may be extremely slow. We are talking mostly of BigTiff
> format here.

This sounds reasonable.  We need a good test case for this (portable C
source code) which be used to measure the performance impact and
evaluate solutions.

Bob
--
Bob Friesenhahn
[hidden email], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
_______________________________________________
Tiff mailing list: [hidden email]
http://lists.maptools.org/mailman/listinfo/tiff
http://www.remotesensing.org/libtiff/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Qs about support for more than 2^16 IFDs and writing performance

Dinesh Iyer
We have had about 5-6 users, mostly from the biomedical imaging community, complain to use about this limitation for reading. They have image stacks of about 100,000 images stored in a single TIFF file. These are not all BigTIFF files. They use applications such as ImageJ, which is very popular among the image processing community, which are able to read such files without any issues.

I can send a standalone C-code for profiling shortly.

Regards,
Dinesh



On Mon, Feb 6, 2017 at 7:58 PM, Bob Friesenhahn <[hidden email]> wrote:
On Tue, 7 Feb 2017, Paavo Helde wrote:

>
> On 7.02.2017 2:04, Bob Friesenhahn wrote:
>>   The linear scan is perhaps not the
>> fastest algorithm but it should not take terribly long (much less than
>> a second) to scan 65535 entries.e e
>
> And if you multiply a second by 60,000, then you get many hours. Writing
> a TIFF file with 60,000 IFD-s will call TIFFLinkDirectory() 60,000 times.
>
> Note that such files typically do not fit in L3 cache and thus
> traversing them may be extremely slow. We are talking mostly of BigTiff
> format here.

This sounds reasonable.  We need a good test case for this (portable C
source code) which be used to measure the performance impact and
evaluate solutions.

Bob
--
Bob Friesenhahn
[hidden email], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
_______________________________________________
Tiff mailing list: [hidden email]
http://lists.maptools.org/mailman/listinfo/tiff
http://www.remotesensing.org/libtiff/


_______________________________________________
Tiff mailing list: [hidden email]
http://lists.maptools.org/mailman/listinfo/tiff
http://www.remotesensing.org/libtiff/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Qs about support for more than 2^16 IFDs and writing performance

Dinesh Iyer
Hi,
I am providing a link to an plot which shows how the time increases for each successive call to TIFFwriteDirectory. I have written 20000, 64x64 images to this TIFF file. The total time taken to write these 20000 images will be the sum of all these times.


On Tue, Feb 7, 2017 at 11:16 AM, Dinesh Iyer <[hidden email]> wrote:
We have had about 5-6 users, mostly from the biomedical imaging community, complain to use about this limitation for reading. They have image stacks of about 100,000 images stored in a single TIFF file. These are not all BigTIFF files. They use applications such as ImageJ, which is very popular among the image processing community, which are able to read such files without any issues.

I can send a standalone C-code for profiling shortly.

Regards,
Dinesh



On Mon, Feb 6, 2017 at 7:58 PM, Bob Friesenhahn <[hidden email]> wrote:
On Tue, 7 Feb 2017, Paavo Helde wrote:

>
> On 7.02.2017 2:04, Bob Friesenhahn wrote:
>>   The linear scan is perhaps not the
>> fastest algorithm but it should not take terribly long (much less than
>> a second) to scan 65535 entries.e e
>
> And if you multiply a second by 60,000, then you get many hours. Writing
> a TIFF file with 60,000 IFD-s will call TIFFLinkDirectory() 60,000 times.
>
> Note that such files typically do not fit in L3 cache and thus
> traversing them may be extremely slow. We are talking mostly of BigTiff
> format here.

This sounds reasonable.  We need a good test case for this (portable C
source code) which be used to measure the performance impact and
evaluate solutions.

Bob
--
Bob Friesenhahn
[hidden email], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
_______________________________________________
Tiff mailing list: [hidden email]
http://lists.maptools.org/mailman/listinfo/tiff
http://www.remotesensing.org/libtiff/



_______________________________________________
Tiff mailing list: [hidden email]
http://lists.maptools.org/mailman/listinfo/tiff
http://www.remotesensing.org/libtiff/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Qs about support for more than 2^16 IFDs and writing performance

Dinesh Iyer
Sorry, I forgot to attach the link:


Regards,
Dinesh

On Tue, Feb 7, 2017 at 12:36 PM, Dinesh Iyer <[hidden email]> wrote:
Hi,
I am providing a link to an plot which shows how the time increases for each successive call to TIFFwriteDirectory. I have written 20000, 64x64 images to this TIFF file. The total time taken to write these 20000 images will be the sum of all these times.


On Tue, Feb 7, 2017 at 11:16 AM, Dinesh Iyer <[hidden email]> wrote:
We have had about 5-6 users, mostly from the biomedical imaging community, complain to use about this limitation for reading. They have image stacks of about 100,000 images stored in a single TIFF file. These are not all BigTIFF files. They use applications such as ImageJ, which is very popular among the image processing community, which are able to read such files without any issues.

I can send a standalone C-code for profiling shortly.

Regards,
Dinesh



On Mon, Feb 6, 2017 at 7:58 PM, Bob Friesenhahn <[hidden email]> wrote:
On Tue, 7 Feb 2017, Paavo Helde wrote:

>
> On 7.02.2017 2:04, Bob Friesenhahn wrote:
>>   The linear scan is perhaps not the
>> fastest algorithm but it should not take terribly long (much less than
>> a second) to scan 65535 entries.e e
>
> And if you multiply a second by 60,000, then you get many hours. Writing
> a TIFF file with 60,000 IFD-s will call TIFFLinkDirectory() 60,000 times.
>
> Note that such files typically do not fit in L3 cache and thus
> traversing them may be extremely slow. We are talking mostly of BigTiff
> format here.

This sounds reasonable.  We need a good test case for this (portable C
source code) which be used to measure the performance impact and
evaluate solutions.

Bob
--
Bob Friesenhahn
[hidden email], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
_______________________________________________
Tiff mailing list: [hidden email]
http://lists.maptools.org/mailman/listinfo/tiff
http://www.remotesensing.org/libtiff/




_______________________________________________
Tiff mailing list: [hidden email]
http://lists.maptools.org/mailman/listinfo/tiff
http://www.remotesensing.org/libtiff/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Qs about support for more than 2^16 IFDs and writing performance

Olivier Paquet-2
In reply to this post by Dinesh Iyer
2017-02-07 12:36 GMT-05:00 Dinesh Iyer <[hidden email]>:
Hi,
I am providing a link to an plot which shows how the time increases for each successive call to TIFFwriteDirectory. I have written 20000, 64x64 images to this TIFF file. The total time taken to write these 20000 images will be the sum of all these times.
 
It is very clearly O(N*N) and we even know why. This is a bug as far as I'm concerned.

As for the 64k limit, I think it's reasonably easy to fix it for sequential access but perhaps not so easy for functions like TIFFSetDirectory() because of the ABI. Whether we should fix the former only and how we could fix the latter are both interesting questions. I'm ok with fixing the internals as a first step so that at least sequential reading and writing works beyond 64k.

Olivier


_______________________________________________
Tiff mailing list: [hidden email]
http://lists.maptools.org/mailman/listinfo/tiff
http://www.remotesensing.org/libtiff/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Qs about support for more than 2^16 IFDs and writing performance

Bob Friesenhahn
On Tue, 7 Feb 2017, Olivier Paquet wrote:
>
> As for the 64k limit, I think it's reasonably easy to fix it for sequential
> access but perhaps not so easy for functions like TIFFSetDirectory()
> because of the ABI. Whether we should fix the former only and how we could

It would be necessary to add a new typedef (similar to tdir_t) and
functions for dealing with the larger directory ids.

Bob
--
Bob Friesenhahn
[hidden email], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
_______________________________________________
Tiff mailing list: [hidden email]
http://lists.maptools.org/mailman/listinfo/tiff
http://www.remotesensing.org/libtiff/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Qs about support for more than 2^16 IFDs and writing performance

Dinesh Iyer
Hi everyone,
Do you have a timeline for when the fix for the slow TIFF writing will be incorporated into the library? If this might take a while i.e. more than a few months, is the workaround suggested in the github post https://github.com/escabe/libtiff/commit/58b4c3ba4478987ecfe1e793b9d925e59eecfa36 reasonable? Can I use this as a temporary workaround?

Regards,
Dinesh

On Tue, Feb 7, 2017 at 3:12 PM, Bob Friesenhahn <[hidden email]> wrote:
On Tue, 7 Feb 2017, Olivier Paquet wrote:
>
> As for the 64k limit, I think it's reasonably easy to fix it for sequential
> access but perhaps not so easy for functions like TIFFSetDirectory()
> because of the ABI. Whether we should fix the former only and how we could

It would be necessary to add a new typedef (similar to tdir_t) and
functions for dealing with the larger directory ids.

Bob
--
Bob Friesenhahn
[hidden email], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
_______________________________________________
Tiff mailing list: [hidden email]
http://lists.maptools.org/mailman/listinfo/tiff
http://www.remotesensing.org/libtiff/


_______________________________________________
Tiff mailing list: [hidden email]
http://lists.maptools.org/mailman/listinfo/tiff
http://www.remotesensing.org/libtiff/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Qs about support for more than 2^16 IFDs and writing performance

Bob Friesenhahn
On Sun, 12 Feb 2017, Dinesh Iyer wrote:

> Hi everyone,
> Do you have a timeline for when the fix for the slow TIFF writing will be
> incorporated into the library? If this might take a while i.e. more than a
> few months, is the workaround suggested in the github post
> https://github.com/escabe/libtiff/commit/58b4c3ba4478987ecfe1e793b9d925
> e59eecfa36 reasonable? Can I use this as a temporary workaround?

The libtiff project is staffed by volunteers.  Currently no one has
volunteered to take on this work.

Does the author of this github post approve the inclusion of the patch
into libtiff?  Has anyone contacted the author?

We are still needing a minimal test-case (in portable C code) which
exhibits the slow writing problem so that the quality of any solution
can be evaluated.

Bob
--
Bob Friesenhahn
[hidden email], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
_______________________________________________
Tiff mailing list: [hidden email]
http://lists.maptools.org/mailman/listinfo/tiff
http://www.remotesensing.org/libtiff/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Qs about support for more than 2^16 IFDs and writing performance

Olivier Paquet-2
2017-02-13 9:35 GMT-05:00 Bob Friesenhahn <[hidden email]>:
>
> We are still needing a minimal test-case (in portable C code) which
> exhibits the slow writing problem so that the quality of any solution
> can be evaluated.

Here's my contribution (slow.c):

#include <tiffio.h>
#include <stdlib.h>
#include <stdio.h>

int main( int argc, const char *argv[] )
{
       int n = atoi( argv[1] );
       TIFF* tif = TIFFOpen( "test.tif", "w" );
       for( int i = 0; i < n; ++i )
       {
               TIFFWriteDirectory( tif );
       }
       TIFFClose( tif );
       remove( "test.tif" );
}

Not a valid TIFF but you did say minimal ;-) It clearly shows bad behavior:

> /usr/bin/gcc slow.c -ltiff -ljpeg -llzma -pthread
> time ./a.out 1000
0.056u 0.203s 0:00.29 86.2%
> time ./a.out 10000
3.563u 21.436s 0:25.05 99.7%
> time ./a.out 20000
14.189u 86.316s 1:40.57 99.9%

Olivier
_______________________________________________
Tiff mailing list: [hidden email]
http://lists.maptools.org/mailman/listinfo/tiff
http://www.remotesensing.org/libtiff/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Qs about support for more than 2^16 IFDs and writing performance

Dinesh Iyer
Hi everyone,
I managed to get in touch with the author of the patch. His name is Martin. He was OK with his work being used in the TIFF library. He also had the following to say about his patch.

I developed this patch really with only incrementally writing a new file in mind, I have not considered at all how well this would work with editing existing files, etc. In that sense I am not sure whether this can easily be merged into the actual library.

But he is OK with this patch being used as a starting point for any fix that will work in all scenarios. Please do let me know if you require any additional information.

Regards,
Dinesh


On Mon, Feb 13, 2017 at 11:22 AM, Olivier Paquet <[hidden email]> wrote:
2017-02-13 9:35 GMT-05:00 Bob Friesenhahn <[hidden email]>:
>
> We are still needing a minimal test-case (in portable C code) which
> exhibits the slow writing problem so that the quality of any solution
> can be evaluated.

Here's my contribution (slow.c):

#include <tiffio.h>
#include <stdlib.h>
#include <stdio.h>

int main( int argc, const char *argv[] )
{
       int n = atoi( argv[1] );
       TIFF* tif = TIFFOpen( "test.tif", "w" );
       for( int i = 0; i < n; ++i )
       {
               TIFFWriteDirectory( tif );
       }
       TIFFClose( tif );
       remove( "test.tif" );
}

Not a valid TIFF but you did say minimal ;-) It clearly shows bad behavior:

> /usr/bin/gcc slow.c -ltiff -ljpeg -llzma -pthread
> time ./a.out 1000
0.056u 0.203s 0:00.29 86.2%
> time ./a.out 10000
3.563u 21.436s 0:25.05 99.7%
> time ./a.out 20000
14.189u 86.316s 1:40.57 99.9%

Olivier
_______________________________________________
Tiff mailing list: [hidden email]
http://lists.maptools.org/mailman/listinfo/tiff
http://www.remotesensing.org/libtiff/


_______________________________________________
Tiff mailing list: [hidden email]
http://lists.maptools.org/mailman/listinfo/tiff
http://www.remotesensing.org/libtiff/
Loading...