Hello,
When running on a serverless cluster in DLT our notebook first tries to install some python whls onto the cluster. We have noticed that when in development and running a pipeline many times over in a short space of time between runs that the pipeline will successfully run the first time it is run and then when running a second time we get the following error:
Processing /Workspace/Shared/libraries/CompanyName/common_library-3.1.1rc1-py3-none-any.whl
WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f3d9bb12570>: Failed to establish a new connection: [Errno 101] Network is unreachable')': /simple/build/
WARNING: Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f3d9b6e71a0>: Failed to establish a new connection: [Errno 101] Network is unreachable')': /simple/build/
WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f3d9b51cb30>: Failed to establish a new connection: [Errno 101] Network is unreachable')': /simple/build/
WARNING: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f3d9b51cd40>: Failed to establish a new connection: [Errno 101] Network is unreachable')': /simple/build/
WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f3d9b51cf20>: Failed to establish a new connection: [Errno 101] Network is unreachable')': /simple/build/
INFO: pip is looking at multiple versions of common-library to determine which version is compatible with other requirements. This could take a while.
ERROR: Could not find a version that satisfies the requirement build==1.2.1 (from common-library) (from versions: none)
ERROR: No matching distribution found for build==1.2.1
When looking at the cluster logs I can see the prior runs logs with the successful installs of the libraries, so I am lost as to how the cluster can lose the connection when running the second time.