API#

Python API Reference#

`Cluster`([name, software, container, ...])	Create a Dask cluster with Coiled
`cluster_logs`(cluster_id[, account, ...])	Returns cluster logs as a dictionary, with a key for the scheduler and each worker.
`create_software_environment`([name, account, ...])	Create a software environment
`delete_cluster`([name, cluster_id, account, ...])	Delete a cluster
`delete_software_environment`(name[, account, ...])	Delete a software environment
`diagnostics`([account])	Run a diagnostic check aimed to help support with any issues.
`get_billing_activity`([account, cluster, ...])	Retrieve Billing information.
`get_software_info`(name[, account, workspace])	Retrieve solved spec for a Coiled software environment
`list_clusters`([account, workspace, ...])	List clusters
`list_core_usage`([account])	Get a list of used cores.
`list_gpu_types`()	List allowed GPU Types.
`list_instance_types`([backend, min_cores, ...])	List allowed instance types for the cloud provider configured on your account.
`BackendOptions`	A dictionary with the following key/value pairs
`AWSOptions`	A dictionary with the following key/value pairs plus any pairs in `BackendOptions`
`GCPOptions`	A dictionary with GCP specific key/value pairs plus any pairs in `BackendOptions`
`list_local_versions`()	Get information about local versions.
`list_performance_reports`([account])	List performance reports stored on Coiled Cloud
`list_software_environments`([account, workspace])	List software environments
`list_user_information`()	List information about your user.
`performance_report`([filename, private, account])	Generates a static performance report and saves it to Coiled Cloud
`set_backend_options`([account, workspace, ...])	Configure workspace-level settings for cloud provider and container registry.
`function`(*[, software, container, vm_type, ...])	Decorate a function to run on cloud infrastructure

Software Environments#

coiled.create_software_environment(name=None, *, account=None, workspace=None, conda=None, pip=None, container=None, log_output=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, force_rebuild=False, use_entrypoint=True, gpu_enabled=False, arm=False, architecture=ArchitectureTypesEnum.X86_64, region_name=None, include_local_code=False, ignore_local_packages=None, use_uv_installer=True)[source]#

Create a software environment

Clusters#

class coiled.Cluster(name=None, *, software=None, container=None, ignore_container_entrypoint=None, n_workers=None, worker_class=None, worker_options=None, worker_vm_types=None, worker_cpu=None, worker_memory=None, worker_disk_size=None, worker_disk_throughput=None, worker_disk_config=None, worker_gpu=None, worker_gpu_type=None, scheduler_options=None, scheduler_vm_types=None, scheduler_cpu=None, scheduler_memory=None, scheduler_disk_size=None, scheduler_disk_config=None, scheduler_gpu=None, asynchronous=False, cloud=None, account=None, workspace=None, shutdown_on_close=None, idle_timeout=None, cluster_timeout=None, no_client_timeout=<object object>, use_scheduler_public_ip=None, use_dashboard_https=None, dashboard_custom_subdomain=None, credentials='local', credentials_duration_seconds=None, timeout=None, environ=None, tags=None, send_dask_config=True, unset_single_threading_variables=None, backend_options=None, show_widget=None, custom_widget=None, configure_logging=None, wait_for_workers=None, package_sync=None, package_sync_strict=False, package_sync_conda_extras=None, package_sync_ignore=None, package_sync_only=None, package_sync_fail_on='critical-only', package_sync_use_uv_installer=True, private_to_creator=None, use_best_zone=None, allow_cross_zone=False, compute_purchase_option=None, spot_policy=None, extra_worker_on_scheduler=None, _n_worker_specs_per_host=None, scheduler_port=None, allow_ingress_from=None, allow_ssh_from=None, allow_ssh=None, allow_spark=None, open_extra_ports=None, jupyter=None, mount_bucket=None, host_setup_script=None, region=None, arm=None, batch_job_ids=None, batch_job_container=None, scheduler_sidecars=None, worker_sidecars=None, pause_on_exit=None, skip_ssd_mount=None, filestores_to_attach=None)#

Create a Dask cluster with Coiled

Parameters:

n_workers (Union[int, List[int], None]) – Number of workers in this cluster. Can either be an integer for a static number of workers, or a [min, max] list specifying the lower and upper bounds for adaptively scaling up/down workers depending on the amount of work submitted. For adaptive scaling, you can also specify an initial number of workers as [min, initial, max]. Defaults to n_workers=[4, 20] which adaptively scales between 4 and 20 workers, and initially starts with 4 workers.
name (str | None) – Name to use for identifying this cluster. Defaults to None.
software (str | None) – Name of the software environment to use; this allows you to use and re-use existing Coiled software environments. Specifying this argument will disable package sync, and it cannot be combined with container.
container (str | None) – Name or URI of container image to use; when using a pre-made container image with Coiled, this allows you to skip the step of explicitly creating a Coiled software environment from that image. Specifying this argument will disable package sync, and it cannot be combined with software.
ignore_container_entrypoint (bool | None) – Ignore entrypoint for specified Docker container (like docker run --entrypoint); default is to use the entrypoint (if any) set on the image.
worker_class (str | None) – Worker class to use. Defaults to distributed.nanny.Nanny.
worker_options (dict | None) – Mapping with keyword arguments to pass to worker_class. Defaults to {}.
worker_vm_types (list | None) – List of instance types that you would like workers to use, default instance type selected contains 4 cores. You can use the command coiled.list_instance_types() to see a list of allowed types.
worker_cpu (Union[int, List[int], None]) – Number, or range, of CPUs requested for each worker. Specify a range by using a list of two elements, for example: worker_cpu=[2, 8].
worker_memory (Union[str, List[str], None]) – Amount of memory to request for each worker, Coiled will use a +/- 10% buffer from the memory that you specify. You may specify a range of memory by using a list of two elements, for example: worker_memory=["2GiB", "4GiB"].
worker_disk_size (int | str | None) – Non-default size of persistent disk attached to each worker instance, specified as string with units or integer for GiB.
worker_disk_throughput (int | None) – EXPERIMENTAL. For AWS, non-default throughput (in MB/s) for EBS gp3 volumes attached to workers.
worker_disk_config (dict | None) – Allows custom configuration of the disk attached to worker VMs. For AWS, this can be EbsBlockDevice dictionary that overrides our default EBS config.
worker_gpu (Union[int, bool, None]) – Number of GPUs to attach to each worker. Default is 0, True is interpreted as 1. Note that this is ignored if you’re explicitly specifying an instance type which includes a fixed number of GPUs.
worker_gpu_type (str | None) – For GCP, this lets you specify type of guest GPU for instances. Should match the way the cloud provider specifies the GPU, for example: worker_gpu_type="nvidia-tesla-t4". By default, Coiled will request NVIDIA T4 if GPU type isn’t specified. For AWS, if you want GPU other than T4, you’ll need to explicitly specify the VM instance type (e.g., g6.xlarge for instance with one NVIDIA L4 GPU).
scheduler_options (dict | None) – Mapping with keyword arguments to pass to the Scheduler __init__. Defaults to {}.
scheduler_vm_types (list | None) – List of instance types that you would like the scheduler to use, default instances type selected contains 4 cores. You can use the command coiled.list_instance_types() to se a list of allowed types.
scheduler_cpu (Union[int, List[int], None]) – Number, or range, of CPUs requested for the scheduler. Specify a range by using a list of two elements, for example: scheduler_cpu=[2, 8].
scheduler_memory (Union[str, List[str], None]) – Amount of memory to request for the scheduler, Coiled will use a +/-10% buffer from the memory what you specify. You may specify a range of memory by using a list of two elements, for example: scheduler_memory=["2GiB", "4GiB"].
scheduler_gpu (bool | None) – Whether to attach GPU to scheduler; this would be a single NVIDIA T4. The best practice for Dask is to have a GPU on the scheduler if you are using GPUs on your workers, so if you don’t explicitly specify, Coiled will follow this best practice and give you a scheduler GPU just in case you have worker_gpu set.
scheduler_disk_config (dict | None) – Allows custom configuration of the disk attached to scheduler VM. For AWS, this can be EbsBlockDevice dictionary that overrides our default EBS config.
asynchronous (bool) – Set to True if using this Cloud within async/await functions or within Tornado gen.coroutines. Otherwise this should remain False for normal use. Default is False.
cloud (CloudV2 | None) – Cloud object to use for interacting with Coiled. This object contains user/authentication/account information. If this is None (default), we look for a recently-cached Cloud object, and if none exists create one.
account (str | None) – DEPRECATED. Use workspace instead.
workspace (str | None) – The Coiled workspace (previously “account”) to use. If not specified, will check the coiled.workspace or coiled.account configuration values, or will use your default workspace if those aren’t set.
shutdown_on_close (bool | None) – Whether or not to shut down the cluster when it finishes. Defaults to True, unless name points to an existing cluster.
idle_timeout (str | None) – Shut down the cluster after this duration if no activity has occurred. E.g. “30 minutes” Default: “20 minutes”
cluster_timeout (str | None) – Shut down the cluster after this duration (even if active). E.g. “2 hours” Default: None
no_client_timeout (str | None | object) – Shut down the cluster after this duration after all clients have disconnected. When shutdown_on_close is False this is disabled, since shutdown_on_close=False usually means you want to keep cluster up after disconnecting so you can later connect a new client. Default: “2 minutes”, or idle_timeout if there’s a non-default idle timeout
use_scheduler_public_ip (bool | None) – Boolean value that determines if the Python client connects to the Dask scheduler using the scheduler machine’s public IP address. The default behaviour when set to True is to connect to the scheduler using its public IP address, which means traffic will be routed over the public internet. When set to False, traffic will be routed over the local network the scheduler lives in, so make sure the scheduler private IP address is routable from where this function call is made when setting this to False.
use_dashboard_https (bool | None) – When public IP address is used for dashboard, we’ll enable HTTPS + auth by default. You may want to disable this if using something that needs to connect directly to the scheduler dashboard without authentication, such as jupyter dask-labextension<=6.1.0.
credentials (str | None) – Which credentials to use for Dask operations and forward to Dask clusters – options are “local”, or None. The default behavior is to use local credentials if available. NOTE: credential handling currently only works with AWS credentials.
credentials_duration_seconds (int | None) – For “local” credentials shipped to cluster as STS token, set the duration of STS token. If not specified, the AWS default will be used.
timeout (Union[int, float, None]) – Timeout in seconds to wait for a cluster to start, will use default_cluster_timeout set on parent Cloud by default.
environ (Optional[Dict[str, str]]) – Dictionary of environment variables. Values will be transmitted to Coiled; for private environment variables (e.g., passwords or access keys you use for data access), send_private_envs() is recommended.
send_dask_config (bool) – Whether to send a frozen copy of local dask.config to the cluster.
unset_single_threading_variables (bool | None) – By default, Dask sets environment variables such as OMP_NUM_THREADS and MKL_NUM_THREADS so that relevant libraries use a single thread per Dask worker (by default there are as many Dask workers as CPU cores). In some cases this is not what you want, so this option overrides the default Dask behavior.
backend_options (Union[AWSOptions, GCPOptions, None]) – Dictionary of backend specific options.
show_widget (bool | None) – Whether to use the rich-based widget display. By default, the widget will show in IPython/Jupyter; specify show_widget=True to make widget show even when not in IPython/Jupyter (for example, when making cluster in code invoked via CLI). For use cases involving multiple Clusters at once, show_widget=False is recommended.
custom_widget (ClusterWidget | None) – Use the rich-based widget display outside of IPython/Jupyter (Default: False)
tags (Optional[Dict[str, str]]) – Dictionary of tags. Can also be set using the coiled.tags Dask configuration option. Tags specified for cluster using keyword argument take precedence over those from Dask configuration.
wait_for_workers (Union[int, float, bool, None]) – Whether to wait for a number of workers before returning control of the prompt back to the user. Usually, computations will run better if you wait for most workers before submitting tasks to the cluster. You can wait for all workers by passing True, or not wait for any by passing False. You can pass a fraction of the total number of workers requested as a float(like 0.6), or a fixed number of workers as an int (like 13). If None, the value from coiled.wait-for-workers in your Dask config will be used. Default: 0.3. If the requested number of workers don’t launch within 10 minutes, the cluster will be shut down, then a TimeoutError is raised.
package_sync (Union[bool, List[str], None]) – DEPRECATED – Always enabled when container and software are not given. Synchronize package versions between your local environment and the cluster. Cannot be used with the container or software options. Passing specific packages as a list of strings will attempt to synchronize only those packages, use with caution. (Deprecated: use package_sync_only instead.) We recommend reading the additional documentation for this feature
package_sync_conda_extras (Optional[List[str]]) – A list of conda package names (available on conda-forge) to include in the environment that are not in your local environment. Use with caution, as this can lead to dependency conflicts with local packages. Note, this will only work for conda package with platform-specific builds (i.e., not “noarch” packages).
package_sync_ignore (Optional[List[str]]) – A list of package names to exclude from the environment. Note their dependencies may still be installed, or they may be installed by another package that depends on them!
package_sync_only (Optional[List[str]]) –
A list of package names to only include from the environment. Use with caution. We recommend reading the additional documentation for this feature
package_sync_strict (bool) – Only allow exact packages matches, not recommended unless your client platform/architecture matches the cluster platform/architecture
package_sync_use_uv_installer (bool) – Use uv to install pip packages when building the software environment. This should only be disabled if you are experiencing issues with uv and need to use pip instead. (Default: True)
private_to_creator (bool | None) – Only allow the cluster creator, not other members of team account, to connect to this cluster.
use_best_zone (bool | None) – Allow the cloud provider to pick the zone (in your specified region) that has best availability for your requested instances. We’ll keep the scheduler and workers all in a single zone in order to avoid any cross-zone network traffic (which would be billed).
allow_cross_zone (bool) – Allow the cluster to have VMs in distinct zones. There’s a cost for cross-zone traffic (usually pennies per GB), so this is a bad choice for shuffle-heavy workloads, but can be a good choice for large embarrassingly parallel workloads.
spot_policy (Optional[Literal['on-demand', 'spot', 'spot_with_fallback']]) – Purchase option to use for workers in your cluster, options are “on-demand”, “spot”, and “spot_with_fallback”; by default this is “on-demand”. (Google Cloud refers to this as “provisioning model” for your instances.) Spot instances are much cheaper, but can have more limited availability and may be terminated while you’re still using them if the cloud provider needs more capacity for other customers. On-demand instances have the best availability and are almost never terminated while still in use, but they’re significantly more expensive than spot instances. For most workloads, “spot_with_fallback” is likely to be a good choice: Coiled will try to get as many spot instances as we can, and if we get less than you requested, we’ll try to get the remaining instances as on-demand. For AWS, when we’re notified that an active spot instance is going to be terminated, we’ll attempt to get a replacement instance (spot if available, but could be on-demand if you’ve enabled “fallback”). Dask on the active instance will attempt a graceful shutdown before the instance is terminated so that computed results won’t be lost.
scheduler_port (int | None) – Specify a port other than the default (443) for communication with Dask scheduler. Usually the default is the right choice; Coiled supports using 443 concurrently for scheduler comms and for scheduler dashboard.
allow_ingress_from (str | None) – Control the CIDR from which cluster firewall allows ingress to scheduler; by default this is open to any source address (0.0.0.0/0). You can specify CIDR, or “me” for just your IP address.
allow_ssh_from (str | None) – Allow connections to scheduler over port 22 (used for SSH) for a specified IP address or CIDR.
allow_ssh (bool | None) – Allow connections to scheduler over port 22, used for SSH.
allow_spark (bool | None) – Allow (secured) connections to scheduler on port 15003 used by Spark Connect. By default, this port is open.
jupyter (bool | None) – Start a Jupyter server in the same process as Dask scheduler. The Jupyter server will be behind HTTPS with authentication (unless you disable use_dashboard_https, which we strongly recommend against). Note that jupyterlab will need to be installed in the software environment used on the cluster (or in your local environment if using package sync). Once the cluster is running, you can use jupyter_link to get link to access the Jupyter server.
mount_bucket (Union[str, List[str], None]) – Optional name or list of names of buckets to mount. For example, "s3://my-s3-bucket" will mount the S3 bucket my-s3-bucket, using your forwarded AWS credentials, and "gs://my-gcs-bucket" will mount the GCS bucket my-gcs-bucket using your forwarded Google Application Default Credentials. Buckets are mounted to subdirectories in both /mount and ./mount (relative to working directory for Dask), subdirectory name will be taken from bucket name. By default, mounting times out after 30 s. You can manually configure a different timeout using the coiled.mount-bucket.timeout configuration value.
host_setup_script (str | None) – Script to run on the host VM during the setup process. You can either specify as text of the script to run, or as path to a local script file to copy and run.
region (str | None) – The cloud provider region in which to run the cluster.
arm (bool | None) – Use ARM instances for cluster; default is x86 (Intel) instances.
scheduler_sidecars (list[dict] | None) – Optional list of additional containers to run as sidecars on the scheduler. For example, scheduler_sidecars=[ {"name": "test", container="foo/foo:latest", command="run_something"} ] will start the foo/foo:latest container with run_something as the command. Note that VM will shut itself down once container exits, to sidecar commands are expected to be things that will keep running.
worker_sidecars (list[dict] | None) – Like scheduler_sidecars, but run on worker VMs instead of scheduler.
pause_on_exit (bool | None) – Pause the cluster instead of shutting it down when exiting.
skip_ssd_mount (bool | None) – Don’t automatically mount SSD/NVMe (if any) to /scratch.
filestores_to_attach (list[dict] | None) – List of filestores to attach (specified as {"id": id, "input": True, "output": True}, not name).

adapt(Adaptive=<class 'coiled.cluster.CoiledAdaptive'>, *, minimum=1, maximum=200, target_duration='3m', wait_count=24, interval='5s', **kwargs)[source]#

Dynamically scale the number of workers in the cluster based on scaling heuristics.

Parameters:

minimum (int) – Minimum number of workers that the cluster should have while on low load, defaults to 1.
maximum (int) – Maximum numbers of workers that the cluster should have while on high load.
wait_count (int) – Number of consecutive times that a worker should be suggested for removal before the cluster removes it.
interval (timedelta or str) – Milliseconds between checks, defaults to 5000 ms.
target_duration (timedelta or str) – Amount of time we want a computation to take. This affects how aggressively the cluster scales up.

Return type:

Adaptive

property asynchronous#: Are we running in the event loop?

close(force_shutdown=False, reason=None)[source]#

Close the cluster.

Return type:: Optional[Awaitable[None]]

property details_url#: URL for cluster on the web UI at cloud.coiled.io.

get_client()[source]#

Return client for the cluster

If a client has already been initialized for the cluster, return that otherwise initialize a new client object.

get_logs(scheduler=True, workers=True)[source]#

Return logs for the scheduler and workers :type scheduler: bool :param scheduler: Whether or not to collect logs for the scheduler :type scheduler: boolean :type workers: bool :param workers: Whether or not to collect logs for the workers :type workers: boolean

Returns:: logs – A dictionary of logs, with one item for the scheduler and one for the workers
Return type:: Dict[str]

get_spark(block_till_ready=True, spark_connect_config=None, executor_memory_factor=None, worker_memory_factor=None)[source]#

Get a spark client. Experimental and subject to change without notice.

To use this, start the cluster with coiled.spark.get_spark_cluster.

spark_connect_config:: Optional dictionary of additional config options. For example, {"spark.foo": "123"} would be equivalent to --config spark.foo=123 when running spark-submit --class spark-connect.
executor_memory_factor:: Determines spark.executor.memory based on the available memory, can be any value between 1 and 0. Default is 1.0, giving all available memory to the executor.
worker_memory_factor:: Determines --memory for org.apache.spark.deploy.worker.Worker, can be any value between 1 and 0. Default is 1.0.

async recommendations(target)[source]#

Make scale up/down recommendations based on current state and target.

Return a recommendation of the form - {“status”: “same”} - {“status”: “up”, “n”: <desired number of total workers>} - {“status”: “down”, “workers”: <list of workers to close>}

Return type:: dict

scale(n, force_stop=True)