ID 394102 - Katana deadlocks if renderboot cannot be terminated (Linux)

Follow

Problem summary

On Linux, when the user triggers a Preview Render but a previous renderboot instance is still running, Katana attempts to terminate the process. By default, SIGTERM is sent. If renderboot is deadlock, SIGTERM will not be handled, and Katana will deadlock.

Customer reported version

katana.3.1v4

TEST PLAN 1:

  • In the Node Graph tab, add a CameraCreate node and an OpScript in a chain.
  • Double-click the OpScript node to set the view and edit flags on it.
  • In the Parameters tab, set the CEL parameter of the OpScript node to /root/world.
  • Also set its script.lua parameter to:

if Interface.AtRoot() then    Interface.CreateChild('expensive_children')else    if Interface.GetOutputName() == 'expensive_children' then        Interface.CreateChild('child')    else        local ffi = require "ffi"        ffi.cdef "unsigned int sleep(unsigned int seconds);"        ffi.C.sleep(10)    endend
  • Start a Preview Render from the OpScript node.
  • Cancel the render straight away by triggering the Render > Cancel All Renders [Shift+Esc] action from the top menu bar.

Actual behavior:

  • In Linux, the Katana UI will freeze for potentially 10 seconds, provided that DEFAULT_RENDER_TERMINATION_SIGNAL is using the default SIGTERM signal. Note that Arnold uses SIGKILL by default, meaning that this issue did not affect Arnold.

Expected behavior:

  • The render is cancelled and the UI does not freeze. The renderboot process may linger for 10 seconds in the background, though. If the sleep in the the OpScript node was changed to, for example, 200 seconds, then the renderboot process could linger for up to 60 seconds, which is the default grace period configured via KATANA_CANCELLED_RENDER_PROCESS_GRACE_PERIOD.

TEST PLAN 2:

  • In the Node Graph tab, add a CameraCreate node and an OpScript in a chain.
  • Double-click the OpScript node to set the view and edit flags on it.
  • In the Parameters tab, set the CEL parameter of the OpScript node to /root/world.
  • Also set its script.lua parameter to:

if Interface.AtRoot() then    Interface.CreateChild('expensive_children')else    if Interface.GetOutputName() == 'expensive_children' then        Interface.CreateChild('child')    else        local ffi = require "ffi"        ffi.cdef "unsigned int sleep(unsigned int seconds);"        ffi.C.sleep(10)    endend

Test 1

  • Start a Live Render from the OpScript node.
  • Start another Live Render from the OpScript node.

Actual behavior:

  • In Linux, the Katana UI will freeze for potentially 10 seconds while the first Live Render is being cancelled; eventually the second Live Render will start.

Expected behavior:

  • The first Live Render will be cancelled straight away, and the second Live Render will start immediately.

Test 2

  • Start a Preview Render or Live Render from the OpScript node.
  • Close Katana.

Actual behavior:

  • In Linux, Katana will not fully exit until 10 seconds later.

Expected behavior:

  • Katana will exit immediately, and control will be returned to the terminal from which Katana was launched.

Workaround

Modifying the DEFAULT_RENDER_TERMINATION_SIGNAL to 'SIGKILL' will close the render process and allow Katana to start the next Render.

This can be setup with " export DEFAULT_RENDER_TERMINATION_SIGNAL=SIGKILL

See Environment Variables in the Katana Developer Guide for more information

Arnold has added its own Environment Variable "ARNOLD_RENDER_TERMINATION_SIGNAL" which is set to SIGKILL by default.

Reproduced by support

This bug has been reproduced in:
PRMAN and ARNOLD (ARNOLD_RENDER_TERMINATION_SIGNAL variable manually set to SIGTERM )

Katana3.1v4 - CentOS7
Katana3.1v1 - CentOS7
Katana3.0v1 - CentOS7
Katana2.6v1 - CentOS7
Katana2.5v1 - CentOS7
Katana2.1v1 - CentOS7

Unable to reproduce the bug in:
Katana3.1v4 - Windows 7

Earliest version tested

This issue appears to be in all versions of the product

    We're sorry to hear that

    Please tell us why