[SRU] Backport feature for disabling migration to Noble and Plucky

Bug #2122551 reported by Quang Ngo
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu Cloud Archive
New
Undecided
Unassigned
Epoxy
New
Undecided
Unassigned
watcher (Ubuntu)
Fix Released
Undecided
Unassigned
Noble
New
Undecided
Unassigned
Plucky
New
Undecided
Unassigned
Questing
Fix Released
Undecided
Unassigned

Bug Description

Watcher upstream has adopted a new feature to the Host Maintenance strategy to disable live or cold migration and safely stop active instances when migration cannot proceed. This feature is planned for Watcher 15.0.0 shipped with OpenStack 25.02, but will be useful if it can be backported to Ubuntu Plucky and Noble.

For example, currently Sunbeam/Canonical OpenStack is mainly using Watcher rock images which are pulled from Ubuntu Cloud Archive Epoxy (watcher 2:14.0.0) and Caracal (watcher 2:12.0.0). Having this new feature will help address this known issue: https://canonical-openstack.readthedocs-hosted.com/en/latest/how-to/operations/maintenance-mode/#known-issues

Upstream commit: https://opendev.org/openstack/watcher/commit/cc26b3b334e5d60bf04c927c771d572445e4a8bc

[ Impact ]
 * User problem: Current host maintenance strategy forces migration operations that may not be suitable for all deployment scenarios.

 * Functional Enhancement: Introduces two new input parameters to the Host Maintenance strategy:
  - `disable_live_migration`: When True, forces cold migration instead of live migration
  - `disable_cold_migration`: When True, prevents cold migration of inactive instances
  - Combined usage: When both are True, only stop actions are performed on active instances

 * Backward Compatibility: All changes are additive with sensible defaults (both new parameters default to False), ensuring existing Host Maintenance strategy deployments continue working unchanged.

 * No API Changes: The feature adds only configuration parameters to current schema and internal action handling - no API modifications or breaking changes to existing interfaces.

 * Target Releases:
   - Ubuntu 25.04 (Plucky) with watcher 2:14.0.0 -> enables UCA Epoxy
   - Ubuntu 24.04 (Noble) with watcher 2:12.0.0

[ Test Case ]

 Prerequisite:
  * OpenStack cluster with Watcher enabled.
  * At least two compute nodes in the cluster
  * Test instances running on the maintenance target node

 1. Test Case 1: Backward Compatibility
 # Verify existing behavior is unchanged
 openstack optimize audit create -g cluster_maintaining -s host_maintenance \
   -p maintenance_node=compute01 -p backup_node=compute02

 # Expected: Traditional live/cold migration behavior (no stop actions)
 openstack actionplan show <audit_uuid>

 2. Test Case 2: Both Migrations Disabled
 # Test stop-only behavior (the new stop action)
 openstack optimize audit create -g cluster_maintaining -s host_maintenance \
   -p maintenance_node=compute01 -p disable_live_migration=True \
   -p disable_cold_migration=True

 # Expected: Action plan contains only "stop" actions for instances
 openstack actionplan show <audit_uuid>

 3. Test Case 3: Live Migration Disabled Only
 # Test cold migration fallback
 openstack optimize audit create -g cluster_maintaining -s host_maintenance \
   -p maintenance_node=compute01 -p disable_live_migration=True

 # Expected: Active instances use cold migration, inactive instances use cold migration
 openstack actionplan show <audit_uuid>

 4. Test Case 4: Cold Migration Disabled Only
 # Test live migration with no cold migration
 openstack optimize audit create -g cluster_maintaining -s host_maintenance \
   -p maintenance_node=compute01 -p disable_cold_migration=True

 # Expected: Active instances use live migration, inactive instances remain untouched
 openstack actionplan show <audit_uuid>

The testing can also be done via Ubuntu OpenStack CI system using Tempest to verify the backward compatibility.

[ Regression Potential / Where problems could occur]

 Configuration Conflicts:
 * Risk: Administrators might misconfigure parameters, leading to unexpected behavior
 * Manifestation: Instances stopped when migration was intended, or vice versa
 * Detection: Review action plans before execution; monitor Watcher logs for parameter validation

 Stop Action Failures:
  * Risk: New stop action might fail on instances with complex configurations (attached volumes, special networking, etc.)
  * Manifestation: Action plan execution failures; instances left in inconsistent states
  * Detection: Failed action plan execution; Nova API errors in Watcher applier logs

 Otherwise, due to no changes in API and the new parameters are set to False by default, the regression potential is mitigated with low risk. All existing CI jobs in upstream have passed.

[ Discussion ]
 N/A

Tags: patch

Related branches

description: updated
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "lp2122551_plucky.patch" seems to be a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. If the attachment isn't a patch, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are member of the ~ubuntu-sponsors, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.]

tags: added: patch
Revision history for this message
Hemanth Nakkina (hemanth-n) wrote :
Changed in watcher (Ubuntu Questing):
status: New → Fix Released
Revision history for this message
Quang Ngo (minhquangngoho) wrote :
Revision history for this message
Quang Ngo (minhquangngoho) wrote :
Revision history for this message
Quang Ngo (minhquangngoho) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.