Excessive memory allocation while chopping strips

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Excessive memory allocation while chopping strips

Nicolas RUFF
Hello,

I saw in https://trac.osgeo.org/gdal/changeset/39082 that you try to defend against overly large memory allocations.

I have a pathological test case from fuzzing where header values are:
ImageWidth = 1060
ImageLength = 536871550

It will hit the following piece of code in tif_dirread.c:

...
/*
* Some manufacturers make life difficult by writing
* large amounts of uncompressed data as a single strip.
* This is contrary to the recommendations of the spec.
* The following makes an attempt at breaking such images
* into strips closer to the recommended 8k bytes.  A
* side effect, however, is that the RowsPerStrip tag
* value may be changed.
*/
if ((tif->tif_dir.td_planarconfig==PLANARCONFIG_CONTIG)&&
   (tif->tif_dir.td_nstrips==1)&&
   (tif->tif_dir.td_compression==COMPRESSION_NONE)&&
   ((tif->tif_flags&(TIFF_STRIPCHOP|TIFF_ISTILED))==TIFF_STRIPCHOP))
    {
        if ( !_TIFFFillStriles(tif) || !tif->tif_dir.td_stripbytecount )
            return 0;
        ChopUpSingleUncompressedStrip(tif);
    }
...

ChopUpSingleUncompressedStrip() will compute the required number of strips and reach this point:

...
newcounts = (uint64*) _TIFFCheckMalloc(tif, nstrips, sizeof (uint64),
"for chopped \"StripByteCounts\" array");
...

In the my case, nstrips = 268435775 (and rowsperstrip = 2), which will try to allocate 268435775 * 8 bytes. This is already beyond 2**31, but we can make it even larger and beyond 2**32 by playing with 
ImageLength.

Rather than playing this cat-and-mouse game of assigning arbitrary limits to every possible tag, I would suggest the following patch in tif_aux.c:

...
#include <math.h>
+#include <limits.h>
...
void*
_TIFFCheckRealloc(TIFF* tif, void* buffer,
 tmsize_t nmemb, tmsize_t elem_size, const char* what)
{
void* cp = NULL;
tmsize_t bytes = nmemb * elem_size;

- /*
- * XXX: Check for integer overflow.
- */
- if (nmemb && elem_size && bytes / elem_size == nmemb)
+ /* TIFF size is limited to 32-bit by design and 31-bit by implementation, so
+  * there should be no reason to ever allocate more than INT_MAX bytes.
+  */+  if ((nmemb <= INT_MAX) && (elem_size <= INT_MAX) && (bytes <= INT_MAX))
cp = _TIFFrealloc(buffer, bytes);

Note: this patch assumes that 
tmsize_t is 64 bits, otherwise the check should be done in a different way.

What do you think?

Regards,
- Nicolas RUFF

_______________________________________________
Tiff mailing list: [hidden email]
http://lists.maptools.org/mailman/listinfo/tiff
http://www.remotesensing.org/libtiff/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Excessive memory allocation while chopping strips

Even Rouault-2

On mardi 27 juin 2017 22:51:43 CEST Nicolas RUFF wrote:

> Hello,

>

> I saw in https://trac.osgeo.org/gdal/changeset/39082 that you try to defend

> against overly large memory allocations.

>

> I have a pathological test case from fuzzing where header values are:

> ImageWidth = 1060

> ImageLength = 536871550

>

> It will hit the following piece of code in tif_dirread.c:

>

> ...

> /*

> * Some manufacturers make life difficult by writing

> * large amounts of uncompressed data as a single strip.

> * This is contrary to the recommendations of the spec.

> * The following makes an attempt at breaking such images

> * into strips closer to the recommended 8k bytes. A

> * side effect, however, is that the RowsPerStrip tag

> * value may be changed.

> */

> if ((tif->tif_dir.td_planarconfig==PLANARCONFIG_CONTIG)&&

> (tif->tif_dir.td_nstrips==1)&&

> (tif->tif_dir.td_compression==COMPRESSION_NONE)&&

> ((tif->tif_flags&(TIFF_STRIPCHOP|TIFF_ISTILED))==TIFF_STRIPCHOP))

> {

> if ( !_TIFFFillStriles(tif) || !tif->tif_dir.td_stripbytecount )

> return 0;

> ChopUpSingleUncompressedStrip(tif);

> }

> ...

>

> ChopUpSingleUncompressedStrip() will compute the required number of strips

> and reach this point:

>

> ...

> newcounts = (uint64*) _TIFFCheckMalloc(tif, nstrips, sizeof (uint64),

> "for chopped \"StripByteCounts\" array");

> ...

>

> In the my case, nstrips = 268435775 (and rowsperstrip = 2), which will try

> to allocate 268435775 * 8 bytes. This is already beyond 2**31, but we can

> make it even larger and beyond 2**32 by playing with ImageLength.

>

> Rather than playing this cat-and-mouse game of assigning arbitrary limits

> to every possible tag, I would suggest the following patch in tif_aux.c:

>

> ...

> #include <math.h>

> +#include <limits.h>

> ...

> void*

> _TIFFCheckRealloc(TIFF* tif, void* buffer,

> tmsize_t nmemb, tmsize_t elem_size, const char* what)

> {

> void* cp = NULL;

> tmsize_t bytes = nmemb * elem_size;

>

> - /*

> - * XXX: Check for integer overflow.

> - */

> - if (nmemb && elem_size && bytes / elem_size == nmemb)

> + /* TIFF size is limited to 32-bit by design and 31-bit by implementation,

> so

> + * there should be no reason to ever allocate more than INT_MAX bytes.

> + */+ if ((nmemb <= INT_MAX) && (elem_size <= INT_MAX) && (bytes <=

> INT_MAX))

> cp = _TIFFrealloc(buffer, bytes);

>

> Note: this patch assumes that tmsize_t is 64 bits, otherwise the check

> should be done in a different way.

>

> What do you think?

 

Glad you raise this point. I wanted to raise this topic (and a bit beyond the particular case you

mention), but haven't found yet the time.

 

BigTIFF may theoretically have strips that are larger than 4GB (although that would be insane

to craft such a file), so that might be legit to allocate that amount of memory

 

And in your case that could even be a standard TIFF file with width=1 and height=2 billion that

would fit on 2 GB. So either it has a single strip declared and ChopUpSingleUncompressedStrip()

is right trying to create a large amount of strips that will require more than 2 GB of RAM, or either

it has 2 billion strips declared in StripOffsets/StripByteCounts tags and you might have try allocating

& reading that from disk (in which case BigTIFF would be needed since the size for the tags woud be enormous)...

 

But indeed you raise a more general problem. libtiff may allocate in various places enormous virtual

memory for very short files. I've tried workarounding a few situations at application level in GDAL,

but I failed in a few ocurences, for example on a OJPEG file where TIFFReadDirectory() will force a call

to _TIFFFillStriles() due to how the OJPEG codec works.

 

A potential solution would be to check the allocation request w.r.t to the file size. One downside of this

is that, in one of my use case, I want support reading TIFF (some particular formulations where the IFD

is at the beginning) in a pure streaming way (from a pipe), so I have no idea of the file size. Or in other

use cases, I read a gzipped TIFF in a streaming way from disk, but knowing the uncompressed size is

super costly (requires to uncompress the whole file). But I admit those are marginal cases.

 

I can see several ways of addessing the issue, which might be complementary :

 

* having a way at the API level to specify to libitff the maximum amount of memory allowed for a single

allocation. --> moderately easy to implement (not hard, but requires a path on the whole code base)

I believe TIFFOpen() cannot allocate a insane amount of memory,

so having this TIFFSetMaxMemoryForSingleAllocation() can be called just afterwards

 

* having a way at the API level to reject files they would consider insane, for example by specifying a

maximum width * height. TIFFReadDirectory() would honour that to early exit and avoid later

processing that might cause excessive memory allocation --> easy to implement

 

* before allocating a lot of memory before reading from file (reading the content of a tag typically),

allocates a partial amound of memory, read that from the file. If successful, realloc() to a larger size,

and read again, etc.. until your read the content.

I've implemented this for strip content fetching lately:

https://github.com/vadz/libtiff/commit/ec74e94dc1f6ac7d28113ea68cdd3ce028150477

Could be replicated for StripOffsets/StripByteCounts reading. There's the following ticket about that:

http://bugzilla.maptools.org/show_bug.cgi?id=2675

--> moderately involved to implement

 

* as ChopUpSingleUncompressedStrip() may allocate memory unrelated to the file size, limit

it to a mximum number of comupted strips, say 1 million. This strip chopping mechanism is a

band-aid for malformed file. We don't have to support it for files of insane dimension, IMHO

Besides allocating a lot of memory, this also can take a few hundred of milliseconds to compute

the synthetic tag values.

--> easy to implement

 

* replicate in libtiff the lazy fetching of the strip offset and counts (read only the values for the current strip,

actually a bit before/after and cache that) that I've implemented in GDAL so

you don't need to fetch the whole tags (that would probably solve my OJPEG case that I can't

workaround from GDAL); This would help a lot in situations where you switch

back and forth among the IFDs of a same file (for example if you display a big image with overviews,

and zoom in/zoom out often, you end up reading them all the time if you just have a single TIFF

handle)

--> involved to implement. Requires auditing the codebase to no longer use td_stripbytecount / td_stripoffset.

Retrieval of the whole content of the corresponding tags through TIFFGetField() would be left only for

compatibility of external code.

 

 

Even

 

 

--

Spatialys - Geospatial professional services

http://www.spatialys.com


_______________________________________________
Tiff mailing list: [hidden email]
http://lists.maptools.org/mailman/listinfo/tiff
http://www.remotesensing.org/libtiff/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Excessive memory allocation while chopping strips

Olivier Paquet-2
2017-06-27 18:08 GMT-04:00 Even Rouault <[hidden email]>:
> But indeed you raise a more general problem. libtiff may allocate in various
> places enormous virtual
>
> memory for very short files. I've tried workarounding a few situations at
> application level in GDAL,

Could you explain why that is a problem?

Olivier
_______________________________________________
Tiff mailing list: [hidden email]
http://lists.maptools.org/mailman/listinfo/tiff
http://www.remotesensing.org/libtiff/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Excessive memory allocation while chopping strips

Even Rouault-2

On mardi 27 juin 2017 18:43:27 CEST Olivier Paquet wrote:

> 2017-06-27 18:08 GMT-04:00 Even Rouault <[hidden email]>:

> > But indeed you raise a more general problem. libtiff may allocate in

> > various places enormous virtual

> >

> > memory for very short files. I've tried workarounding a few situations at

> > application level in GDAL,

>

> Could you explain why that is a problem?

 

Fuzzers (specifically the address sanitizer) crash on this situations, assuming that attempts of huge memory allocations are a bug. libtiff itself will not crash (or it is a real bug) if a malloc() fails, but this isn't a good practice to attempt allocating a lot of memory if it is not needed. And indeed the files processed by fuzzers are just a few kilobytes large at most.

 

--

Spatialys - Geospatial professional services

http://www.spatialys.com


_______________________________________________
Tiff mailing list: [hidden email]
http://lists.maptools.org/mailman/listinfo/tiff
http://www.remotesensing.org/libtiff/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Excessive memory allocation while chopping strips

Olivier Paquet-2
2017-06-27 18:52 GMT-04:00 Even Rouault <[hidden email]>:
>
> Fuzzers (specifically the address sanitizer) crash on this situations,
> assuming that attempts of huge memory allocations are a bug. libtiff itself
> will not crash (or it is a real bug) if a malloc() fails, but this isn't a
> good practice to attempt allocating a lot of memory if it is not needed. And
> indeed the files processed by fuzzers are just a few kilobytes large at
> most.

I think we should be careful not to break libtiff because of a broken
tool then. It's great if we can reduce the huge allocations but it
seems to me that because of the nature of the TIFF format, they will
be hard to eliminate completely without breaking some valid files or
writing a lot of complex code (read: code with new, unknown bugs).
Just something to keep in mind.

With that said, I don't mind limiting the patch to split huge strips.
I don't think we're helping anyone by applying it to files millions of
pixels high. However, won't it just push the potential allocation
problem to strip size? Or are there other checks there?

An API to set an allocation limit is a little tricky as TIFFOpen()
will read the first IFD unless you use 'h' and that could contain
large tags. But otherwise, it looks like something which might be
useful to someone required to deal with unsafe input.

Olivier
_______________________________________________
Tiff mailing list: [hidden email]
http://lists.maptools.org/mailman/listinfo/tiff
http://www.remotesensing.org/libtiff/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Excessive memory allocation while chopping strips

Even Rouault-2

On mardi 27 juin 2017 22:26:40 CEST Olivier Paquet wrote:

> 2017-06-27 18:52 GMT-04:00 Even Rouault <[hidden email]>:

> > Fuzzers (specifically the address sanitizer) crash on this situations,

> > assuming that attempts of huge memory allocations are a bug. libtiff

> > itself

> > will not crash (or it is a real bug) if a malloc() fails, but this isn't a

> > good practice to attempt allocating a lot of memory if it is not needed.

> > And indeed the files processed by fuzzers are just a few kilobytes large

> > at most.

>

> I think we should be careful not to break libtiff because of a broken

> tool then.

 

I wouldn't call the tool "broken". It is a reasonable behaviour in most cases

This is the -rss_limit_mb setting of http://llvm.org/docs/LibFuzzer.html which is used by OSS Fuzz underneath and is set up to 2 GB by default. It really helps identifying places in the code where large memory allocations are done (and sometimes this isn't just reserving virtual memory, but really pinning it, which can cause OOM killer to trigger)

 

> It's great if we can reduce the huge allocations but it

> seems to me that because of the nature of the TIFF format, they will

> be hard to eliminate completely without breaking some valid files or

> writing a lot of complex code (read: code with new, unknown bugs).

> Just something to keep in mind.

 

Agreed. I'd encourage people using libtiff to regularly track the HEAD version to help spotting potential new regressions.

 

>

> With that said, I don't mind limiting the patch to split huge strips.

> I don't think we're helping anyone by applying it to files millions of

> pixels high. However, won't it just push the potential allocation

> problem to strip size? Or are there other checks there?

 

For strip reading, there's the mechanism I already put in place and described in my previous email. Small files that would declare large strip byte count will no longer cause the full allocation to be done before beginning reading the strip content.

 

>

> An API to set an allocation limit is a little tricky as TIFFOpen()

> will read the first IFD unless you use 'h' and that could contain

> large tags.

 

Good point (and thanks for advertizing this flag I didn't know), but that's acceptable I think for someone using advanced API.

 

> But otherwise, it looks like something which might be

> useful to someone required to deal with unsafe input.

 

Given the number of teams that fuzz libtiff / tiff tools and report issues in Bugzilla, it seems there's a lot of interest for such use cases.

 

Even

 

--

Spatialys - Geospatial professional services

http://www.spatialys.com


_______________________________________________
Tiff mailing list: [hidden email]
http://lists.maptools.org/mailman/listinfo/tiff
http://www.remotesensing.org/libtiff/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Excessive memory allocation while chopping strips

Olivier Paquet-2
2017-06-28 6:09 GMT-04:00 Even Rouault <[hidden email]>:
> I wouldn't call the tool "broken". It is a reasonable behaviour in most
> cases

I would. It's reporting problems for code which runs fine. It's
causing "fixes" to be written which may introduce new security issues,
the very thing the tool is meant to detect in the first place.

That does not mean we might not still be better off with the tool than
without. But we have to be careful about what we do based on its
reports. It reminds me a lot of the early days of valgrind, where I'd
"fix" some things which we not real bugs to be able to keep using the
tool. They were simpler fixes, mostly uninitialized memory, but even
those could have unwanted consequences (eg. calloc() is a whole lot
slower than malloc()).

> For strip reading, there's the mechanism I already put in place and
> described in my previous email. Small files that would declare large strip
> byte count will no longer cause the full allocation to be done before
> beginning reading the strip content.

Looks like a reasonable fix but:
- It's not that simple (should ideally have a few pair of eyes go over it).
- It made reading the large strips a little slower.
- It may have increased the peak amount of memory required (in case
realloc needs to copy).
- It does not help with compressed strips, as I understand it.

>> But otherwise, it looks like something which might be
>
>> useful to someone required to deal with unsafe input.
>
>
> Given the number of teams that fuzz libtiff / tiff tools and report issues
> in Bugzilla, it seems there's a lot of interest for such use cases.

I don't know about that. Just because people like to use their shiny
new fuzzer on everything they can get their hands on does not mean
code will be written to use the new API. I might be wrong but I get
the feeling that in many cases libtiff is being dragged along
indirectly with a bunch of other dependencies, typically through some
flavor of an abstract image handling library. Do you know better from
reading the bug reports?

If we're going to have this, I think it should be the default behavior
across the board. We should first fix the cases needed by well formed
large files (ie. progressive allocation for large tags, as you did for
the strips). Then simply refuse strips/tiles larger than X (say, 1 GB
or even 100 MB) and add a flag to TIFFOpen() to override that for the
rare few projects which really need to deal with such files and don't
care about the potential consequences. Making it the default requires
that we make it work well and ensures it will be used. It's also more
work, unfortunately.

Olivier
_______________________________________________
Tiff mailing list: [hidden email]
http://lists.maptools.org/mailman/listinfo/tiff
http://www.remotesensing.org/libtiff/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Excessive memory allocation while chopping strips

Even Rouault-2

> > For strip reading, there's the mechanism I already put in place and

> > described in my previous email. Small files that would declare large strip

> > byte count will no longer cause the full allocation to be done before

> > beginning reading the strip content.

>

> Looks like a reasonable fix but:

> - It's not that simple (should ideally have a few pair of eyes go over it).

Reviewing is always welcome. I'd be open to a more formal review process if that's needed but it would imply people would volunteer reviewing proposed patches in due time. My feeling is that the libtiff development community has not lately been sufficiently large and active for that.

> - It made reading the large strips a little slower.

Agreed. Although potential RAM recopying due to realloc() must be significantly faster than the typical I/O acquisition time.

> - It may have increased the peak amount of memory required (in case

> realloc needs to copy).

Agreed. To limitate a bit that issue, I've only enabled that on 64 bits builds, because the main issue might be the lack of virtual address space on 32 bits.

We could short-circuit all this non-trivial progressive allocation and reading logic in the case where we know the file size. I wanted to have something that would work without knowing the file size, but I agree my use cases where getting the file size is a slow operation are not common. As you suggest below a TIFFOpen() flag for something else, I'm wondering if we couldn't have a TIFFOpen() flag that would mean "calling TIFFGetFileSize() is a costly operation on this file handle, please do not do this unless strictly required".

 

> - It does not help with compressed strips, as I understand it.

Hum, it should apply both to uncompressed or compressed strips/tiles AFAIR. And this changeset doesn't apply to the mmap'ed case since that one knows the file size and already validates the strip size against it.

 

> I don't know about that. Just because people like to use their shiny

> new fuzzer on everything they can get their hands on does not mean

> code will be written to use the new API. I might be wrong but I get

> the feeling that in many cases libtiff is being dragged along

> indirectly with a bunch of other dependencies, typically through some

> flavor of an abstract image handling library. Do you know better from

> reading the bug reports?

 

Most of the time it seems people use the TIFF utilities (tiffcp, tiffsplit, tiff2pdf, etc...) for the fuzzing. Most of the bugs are in the utilities themselves.

 

As far as I'm concerned, most issues I've fixed recently were found through the fuzzing of GDAL and its GTiff driver which uses libtiff ( the GDAL builds I use have an internal libtiff copy, that I sync with libtiff CVS head, so it benefits indirectly from the compilation instrumentation flags used by the fuzzing environment. GDAL builds libtiff with CHUNKY_STRIP_READ_SUPPORT)

 

>

> If we're going to have this, I think it should be the default behavior

> across the board. We should first fix the cases needed by well formed

> large files (ie. progressive allocation for large tags, as you did for

> the strips). Then simply refuse strips/tiles larger than X (say, 1 GB

> or even 100 MB) and add a flag to TIFFOpen() to override that for the

> rare few projects which really need to deal with such files and don't

> care about the potential consequences. Making it the default requires

> that we make it work well and ensures it will be used. It's also more

> work, unfortunately.

 

I didn't dare suggesting setting stricter limits in libtiff by default (with a flag to retain current behaviour as you suggest), but that would certainly be a better protection than opt-in mechanisms. I'm certainly +1 for that.

 

 

--

Spatialys - Geospatial professional services

http://www.spatialys.com


_______________________________________________
Tiff mailing list: [hidden email]
http://lists.maptools.org/mailman/listinfo/tiff
http://www.remotesensing.org/libtiff/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Excessive memory allocation while chopping strips

Olivier Paquet-2
2017-06-28 10:36 GMT-04:00 Even Rouault <[hidden email]>:
>> - It's not that simple (should ideally have a few pair of eyes go over
>> it).
>
> Reviewing is always welcome. I'd be open to a more formal review process if
> that's needed but it would imply people would volunteer reviewing proposed
> patches in due time. My feeling is that the libtiff development community
> has not lately been sufficiently large and active for that.

Indeed. I barely have time to keep up with the mailing list myself.
And we have yet to formally move to git, something which should have
been done years ago.

>> - It made reading the large strips a little slower.
>
> Agreed. Although potential RAM recopying due to realloc() must be
> significantly faster than the typical I/O acquisition time.

True. I have code which rereads directories from memory but I don't
think it will hit your changes. In any case, I was just pointing out
the non triviality of fixes.

> We could short-circuit all this non-trivial progressive allocation and
> reading logic in the case where we know the file size. I wanted to have
> something that would work without knowing the file size, but I agree my use
> cases where getting the file size is a slow operation are not common. As you
> suggest below a TIFFOpen() flag for something else, I'm wondering if we
> couldn't have a TIFFOpen() flag that would mean "calling TIFFGetFileSize()
> is a costly operation on this file handle, please do not do this unless
> strictly required".

I think it best to do it in fewer different ways. If your new code
ends up used only in rare cases where getting file size is expensive,
bugs in it are less likely to be found. It's already bad enough if it
is not used in the mmap case, which I think is the default. Even
though your use case is not common, that is no reason to make it even
more difficult.

>> - It does not help with compressed strips, as I understand it.
>
> Hum, it should apply both to uncompressed or compressed strips/tiles AFAIR.
> And this changeset doesn't apply to the mmap'ed case since that one knows
> the file size and already validates the strip size against it.

I think you misunderstood me. I meant the problem that uncompressed
strip size can be quite large and, as far as I know, can't be
validated without decompressing the data. A hard limit on strip size
would cover that neatly and sane applications will keep it reasonable
anyway.

> Most of the time it seems people use the TIFF utilities (tiffcp, tiffsplit,
> tiff2pdf, etc...) for the fuzzing. Most of the bugs are in the utilities
> themselves.

Then having them default to accept only "sane" files should avoid a
lot of false reports. People who actually hit a crappy file could
always override that.

> I didn't dare suggesting setting stricter limits in libtiff by default (with
> a flag to retain current behaviour as you suggest), but that would certainly
> be a better protection than opt-in mechanisms. I'm certainly +1 for that.

If we can't have both "security" and "reading crappy files from some
obscure software from the 90s", I think it's reasonable to pick the
former as a default. I'd like to hear other people's thoughts on that.

Olivier
_______________________________________________
Tiff mailing list: [hidden email]
http://lists.maptools.org/mailman/listinfo/tiff
http://www.remotesensing.org/libtiff/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Excessive memory allocation while chopping strips

Bob Friesenhahn
On Wed, 28 Jun 2017, Olivier Paquet wrote:
>
> I think you misunderstood me. I meant the problem that uncompressed
> strip size can be quite large and, as far as I know, can't be
> validated without decompressing the data. A hard limit on strip size
> would cover that neatly and sane applications will keep it reasonable
> anyway.

I agree that implementing a hard limit on strip/tile size (including
when writing) is beneficial.

The question is what the justification for using very large
strips/tiles and what is the largest justifiable large strip/tile
size?

For a very large image, if a strip is just one row, then it needs to
be allowed to be large enough to support the maximum image width.

Bob
--
Bob Friesenhahn
[hidden email], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
_______________________________________________
Tiff mailing list: [hidden email]
http://lists.maptools.org/mailman/listinfo/tiff
http://www.remotesensing.org/libtiff/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Excessive memory allocation while chopping strips

Paavo Helde
In reply to this post by Olivier Paquet-2

On 28.06.2017 18:22, Olivier Paquet wrote:
> I think you misunderstood me. I meant the problem that uncompressed
> strip size can be quite large and, as far as I know, can't be
> validated without decompressing the data. A hard limit on strip size
> would cover that neatly and sane applications will keep it reasonable
> anyway.

If I had a vote a would vote against imposing arbitrary size limits. For
example, our company is making software for customers which includes
libtiff and the customers expect it to read *any* TIFF file which they
happen to  throw on it, both now and 10 years in the future. If the
software does not cope with it, they blame us and don't care this is
because we have forgotten to switch on some obscure flag first appearing
in libtiff version x.y.z and which would have restored the libtiff
capability to read any files it is capable to.

And yes, these are the customers having 20 GB TIFF files (has happened)
and 10 GB strips (will happen soon). There is no "sane" limit.

Best regards
Paavo

_______________________________________________
Tiff mailing list: [hidden email]
http://lists.maptools.org/mailman/listinfo/tiff
http://www.remotesensing.org/libtiff/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Excessive memory allocation while chopping strips

Rob Tillaart
In reply to this post by Bob Friesenhahn
(long time no see :)

It makes sense to have hard limits in place where exceeding those would cause undefined behaviour. (e.g. due to 32 bit overflow etc)

Furthermore some soft limits that would make systems e.g. extreme slow (zillion allocs whatever) makes sense too but these 
should be overrulable if people really want to. Possibly use #define for the soft limits in a config file ?
In the end these limits will depend on the target system where the application runs..


wrt the question:
Very large strips and or tiles are used by people who expect that it will increase the performance of their application in some sense,
and yes there might be some gain.Most computers can handle MB easily (even a .$35 RaspberryPi 3 can do serious processing). 
64 GB of memory (on a multicore 64bit cpu) is not rare anymore in an age of  "BIG" data (recall BIG Tiff came first ;)

For some types of devices row based processing is the natural way of working. If I look e.g. at a wide format (around 60" or 1.5 mtr) 
scanner at 2400dpi results in 144K pixels per row @ 4 bytes per pixel is about 0.6 MB per row. It could have 300K rows (120" or 3mtr),
resulting in 200GB image files. Increasing the bitdepth to 16 bit per channel brings us to 0.4 TB per image.

Stitching applications are also quite common and these can make extreme large images in one or both dimensions. Large rows would
not be strange. 

So in short, there is a need to support large images, which implies larges tiles/stripes....

<thinking out loud modus started>
I dont think that a strip should be as big as the imagewidth for every possible value. From some distance (abstract level) there are 
only tiles (in extreme heigth == 1  && width == imagewidth  ==> single row).
If imagewidth is "too" large one could split that row in e.g. 4 tiles (heigth == 1 && width == imagewidth/4 ) to be processable.

maybe all the lib algorithms should become tile based and should we deprecate stripe based ones...
</>

my 2 cents,
Rob Tillaart



On Wed, Jun 28, 2017 at 5:42 PM, Bob Friesenhahn <[hidden email]> wrote:
On Wed, 28 Jun 2017, Olivier Paquet wrote:
>
> I think you misunderstood me. I meant the problem that uncompressed
> strip size can be quite large and, as far as I know, can't be
> validated without decompressing the data. A hard limit on strip size
> would cover that neatly and sane applications will keep it reasonable
> anyway.

I agree that implementing a hard limit on strip/tile size (including
when writing) is beneficial.

The question is what the justification for using very large
strips/tiles and what is the largest justifiable large strip/tile
size?

For a very large image, if a strip is just one row, then it needs to
be allowed to be large enough to support the maximum image width.

Bob
--
Bob Friesenhahn
[hidden email], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
_______________________________________________
Tiff mailing list: [hidden email]
http://lists.maptools.org/mailman/listinfo/tiff
http://www.remotesensing.org/libtiff/


_______________________________________________
Tiff mailing list: [hidden email]
http://lists.maptools.org/mailman/listinfo/tiff
http://www.remotesensing.org/libtiff/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Excessive memory allocation while chopping strips

Bob Friesenhahn
On Wed, 28 Jun 2017, Rob Tillaart wrote:
> wrt the question:
> Very large strips and or tiles are used by people who expect that it will
> increase the performance of their application in some sense,
> and yes there might be some gain.Most computers can handle MB easily (even

Even now, most (Intel) CPUs have L2 caches optimized for looping
through no more than 256k data at a time.  Even if more data is read
into RAM, the processing loops should usually not consume more than
about 128k and performance definitely drops off once the working set
exceeds 256k.

The law of diminishing returns is always present, and given L1, L2,
and (possibly) L3 caches, there is a diminishment whever there are
accesses to the next level of cache, or main memory.

P.S. I live in Texas, where bigger is always assumed to be better.

Bob
--
Bob Friesenhahn
[hidden email], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
_______________________________________________
Tiff mailing list: [hidden email]
http://lists.maptools.org/mailman/listinfo/tiff
http://www.remotesensing.org/libtiff/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Excessive memory allocation while chopping strips

Paavo Helde

On 28.06.2017 23:37, Bob Friesenhahn wrote:

> On Wed, 28 Jun 2017, Rob Tillaart wrote:
>> wrt the question:
>> Very large strips and or tiles are used by people who expect that it will
>> increase the performance of their application in some sense,
>> and yes there might be some gain.Most computers can handle MB easily (even
> Even now, most (Intel) CPUs have L2 caches optimized for looping
> through no more than 256k data at a time.  Even if more data is read
> into RAM, the processing loops should usually not consume more than
> about 128k and performance definitely drops off once the working set
> exceeds 256k.

Yes, limiting the strip or tile size to some pretty large limit would be
reasonable. However, people are not reasonable. Also, lots of processing
software read the whole image into memory before any processing so the
potential performance benefits from piecewise processing would be lost
anyway.

Writing the whole image always in a single strip is also simpler then
chopping it up to strips or tiles, so I am sure there are lots of
software pieces out there which are writing single-strip TIFF files
without using libtiff or any other library (this is actually pretty
simple) and these strips are growing larger all the time because of
hardware advancements. A fully reasonable 2000x2000x16bpp image would
already be 8 MB and that doesn't stop there. The reading part (libtiff)
should be lenient and allow for reading such files as long as they fit
in memory.

Just my 2 euro cents
Paavo

_______________________________________________
Tiff mailing list: [hidden email]
http://lists.maptools.org/mailman/listinfo/tiff
http://www.remotesensing.org/libtiff/
Loading...