State:New|TargetRelease:No Target|icon_bug|icon_nuke|database:public|Resolution:Fixed|BugID:224390|
Problem summary
Customer experiencing random crashes with nuke 10.0, specifically trying to load a Tile during an engine call with the mt (multithreaded) flag set to true:
Tile tile(input0(), box, ChannelSet(Mask_RGBA), true /* mt */);
... crashes with ...
nuke10: line 1: 46796 Segmentation fault (core dumped)
........................
This is related to, and may have the same root cause as these 2 bugs:
209025
Nuke10.0v1 crashes in customer script on dual-CPU machines with many cores
216525
Crash when rendering scripts with Particles, ZDefocus or OFlow on multi core machine without limiting threads with -m
...........................
Originally reported on The Foundry forum here:
http://community.thefoundry.co.uk/discussion/topic.aspx?f=191&t=123952&page=0#1033916
Customer reported version
Nuke 10.0v2 and 9.0v8
Customer reported platform
n/a
Steps to reproduce
(customer report)
...random crashes with nuke 10.0, specifically trying to load a Tile during an engine call with the mt (multithreaded) flag set to true:
Tile tile(input0(), box, ChannelSet(Mask_RGBA), true /* mt */);
... crashes with ...
nuke10: line 1: 46796 Segmentation fault (core dumped)
This call is made under a lock guard during the first engine() call. The crash occurs when I display the node in the viewer, and only when I've previously displayed the node's input in the viewer. I am not sure why.
System details: Ubuntu 14, 128GB RAM. For some reason, this crash doesn't occur under nuke 9.0v7 or earlier (but we would really like to upgrade to nuke 10).
One thing to note is that I'm using a very large format (and corresponding bounding boxes) e.g. 40k * 20k, which I will entirely load in memory. I do not seem to be running out of memory, and there is usually plenty of space in the nuke cache when the crashes occur.
The crash occurs in 'OpTreeHandler::updateCache()' below; meanwhile the engine thread is invoking the destructor of the Tile.
Thread 231 (Thread 0x7f5a86ffe700 (LWP 46357)):
#0 0x00007f5ef85e39a9 in DD::Image::OpTreeHandler::updateCache() const () from /usr/local/Nuke10.0v2/libDDImage.so
#1 0x00007f5ef85e398a in DD::Image::OpTreeHandler::isInAnyTree() const () from /usr/local/Nuke10.0v2/libDDImage.so
#2 0x00007f5ef85d8bc0 in DD::Image::Op::aborted() const () from /usr/local/Nuke10.0v2/libDDImage.so
#3 0x00007f5ef8596c01 in DD::Image::Interest::load_range(int, int) () from /usr/local/Nuke10.0v2/libDDImage.so
#4 0x00007f5ef8596b7f in ?? () from /usr/local/Nuke10.0v2/libDDImage.so
#5 0x00007f5ef8652edc in ?? () from /usr/local/Nuke10.0v2/libDDImage.so
#6 0x00007f5f0b514184 in start_thread (arg=0x7f5a86ffe700) at pthread_create.c:312
#7 0x00007f5f0b82437d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 42 (Thread 0x7f5c91ffd700 (LWP 46252)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x00007f5ef8652de1 in DD::Image::Thread::wait(void*) () from /usr/local/Nuke10.0v2/libDDImage.so
#2 0x00007f5ef8597784 in DD::Image::Interest::~Interest() () from /usr/local/Nuke10.0v2/libDDImage.so
#3 0x00007f5ef8597746 in DD::Image::Interest::~Interest() () from /usr/local/Nuke10.0v2/libDDImage.so
#4 0x00007f5ef854c8ac in DD::Image::GeneralTile::~GeneralTile() () from /usr/local/Nuke10.0v2/libDDImage.so
#5 0x00007f5ef854c886 in DD::Image::GeneralTile::~GeneralTile() () from /usr/local/Nuke10.0v2/libDDImage.so
#6 0x00007f5d305fa348 in ~RawGeneralTile (this=0x7f5c91ff8020, __in_chrg=<optimized out>) at /usr/local/Nuke10.0v2/include/DDImage/RawGeneralTile.h:18
#7 SimpleTile::engine (this=0x619cbd0, y=<optimized out>, x=<optimized out>, r=<optimized out>, channels=..., outRow=...) at /home/dev/SimpleTile.cpp:96
#8 0x00007f5ef850f95e in ?? () from /usr/local/Nuke10.0v2/libDDImage.so
#9 0x00007f5ef850ffa8 in ?? () from /usr/local/Nuke10.0v2/libDDImage.so
Reproduceable, minimal example:
#include <stdlib.h>
#include <DDImage/Knobs.h>
#include <DDImage/Row.h>
#include <DDImage/Tile.h>
// General plugin utils
using namespace DD::Image;
class SimpleTile : public Iop
{
public:
SimpleTile(Node* node) : Iop (node) { inputs(1); }
virtual ~SimpleTile() {}
const char* Class() const { return "SimpleTile"; }
const char* node_help() const { return "SimpleTile";}
Op * default_input(int input) const { return 0;}
bool firstEngineRendersWholeRequest() const { return true; }
void _validate(bool);
void _request(int x, int y, int r, int t, ChannelMask channels, int count);
void engine( int y, int x, int r, ChannelMask channels, Row& outRow);
private:
static const Iop::Description description;
Lock _lock;
Format _format;
Hash _input0hash;
};
void SimpleTile::_validate(bool for_real)
{
if (!input(0))
{
info_.set(DD::Image::Box());
info_.channels(Mask_None);
return;
}
copy_info(0);
info_.channels(Mask_RGBA);
}
void SimpleTile::_request(int x, int y, int r, int t, ChannelMask channels,
int count)
{
if (input(0))
{
int i0_width = input0().info().w();
int i0_height = input0().info().h();
input0().request(input0().info(), Mask_RGBA, count);
}
}
void SimpleTile::engine(int y, int x, int r, ChannelMask channels, Row& outRow)
{
if (!input(0))
{
foreach (c, channels)
outRow.erase(c);
return;
}
{
Guard guard(_lock);
if(input0().hash() != _input0hash) {
Box box(input0().info());
Tile tile(input0(),
box,
ChannelSet(Mask_RGBA),
true /* mt */);
if (this->Op::aborted()) {
return;
}
std::cout << "done " << std::endl; // here I would do something more interesting with the data.
_input0hash = input0().hash();
}
}
foreach(channel, channels)
{
outRow.erase(channel);
}
}
static Iop* STCreate(Node* node)
{
return (new SimpleTile(node));
}
const Iop::Description SimpleTile::description("SimpleTile",
"Org/SimpleTile",
STCreate);
Reproduced by support
not repro'd by Support.
Engineering to use the customer's code and scenario to test a fix for the core issues behind #209025and #216525 which will also reveal whether this issue is the same.
Expected behaviour
not a crash
Actual behaviour
a crash
We're sorry to hear that
Please tell us why