Problem summary When trying to train CopyCat Distributed training with an IPv4 address as the main address on Windows it fails to establish a connection and train.
Customer reported version Nuke 15.0v1
Customer reported platform Windows 10
Steps to reproduce
1) Go to Command Prompt and run in this command: ipconfig
2) Copy the IPv4 from your Windows properties.
3) set up distributed training with a command line that looks like the following, and press Enter:
set COPYCAT_MAIN_ADDR=IPv4Addr set COPYCAT_MAIN_PORT=30000 set COPYCAT_RANK=0 set COPYCAT_WORLD_SIZE=2 ./Nuke14.1v1/Nuke14.1 --nukex -i -F 1 -X CopyCat1 --gpu /path/to/attached/script.nk
Expected behaviour It says that it is waiting to start with a rank of 0
Actual behaviour CopyCat doesn't train and instead complains about the address [W socket.cpp:558] [c10d] The IPv4 network addresses of (IPv4 adress, 30000) cannot be retrieved (gai error: -9 - Address family for hostname not supported)
Workaround Use the IPv6 address, rather than IPv4 address.
Reproduced by support This bug has been reproduced in:
Nuke 15.0v1 - Windows
Nuke 14.1v1 - Windows
Earliest version tested Nuke 14.1v1 - This feature did not exist before this version