State:New|TargetRelease:No Target|icon_bug|icon_nuke|database:public|Resolution:Fixed|BugID:592157|
Problem summary:
The COPYCAT_MAIN_PORT environment variable is not utilized by all machines when CopyCat distributed training
Customer reported version:
NukeX 15.1v4
Customer reported platform:
Rocky 9
Steps to reproduce:
1) Set up distributed training for CopyCat by following the guide here: https://learn.foundry.com/nuke/content/comp_environment/air_tools/cc-dist-manual.html
2) Create a firewall and only allow the COPYCAT_MAIN_PORT for connection, port 30000 in this case.
3) Attempt to run the distributed training, and observe how the firewall prevents the rank 1 machine from connecting.
Expected behavior:
When distributing CopyCat training over a network, all machines should utilize the port specified by the COPYCAT_MAIN_PORT environment variable.
Actual behavior:
Distributed CopyCat training will use the specified port for incoming connections on the rank 0 machine, but all other machines will actively connect and use random ports for initialization.
Workaround:
No known workaround at this time.
Reproduced by Development team
We're sorry to hear that
Please tell us why