GitLab Elasticsearch integration support for in-cluster re-indexing
<!-- The first three sections: "Problem to solve", "Intended users" and "Proposal", are strongly recommended, while the rest of the sections can be filled out during the problem validation or breakdown phase. However, keep in mind that providing complete and relevant information early helps our product team validate the problem and start working on a solution. --> ### Problem to solve We are planning on implementing the ability to re-index everything in GitLab to a new cluster/index in https://gitlab.com/gitlab-org/gitlab/-/merge_requests/17230 but this may not always be the most efficient option and there are often cases where we'll just want to do a straight re-indexing in the Elasticsearch cluster. It's possible that the re-indexing using all of GitLab's code/data might end up being considerably more costly and put more strain on other systems like Postgres, Redis, Sidekiq when we could be doing this re-indexing in Elasticsearch. Doing a [re-indexing in Elasticsearch](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html) is certainly more limited than the implementation in https://gitlab.com/gitlab-org/gitlab/-/merge_requests/17230 since it will only cover cases where the index options have changed and not where the application code surrounding it has changed but in such cases which may be somewhat frequent it will likely be the most efficient option. ### Intended users <!-- Who will use this feature? If known, include any of the following: types of users (e.g. Developer), personas, or specific company roles (e.g. Release Manager). It's okay to write "Unknown" and fill this field in later. Personas are described at https://about.gitlab.com/handbook/marketing/product-marketing/roles-personas/ * [Rachel (Release Manager)](https://about.gitlab.com/handbook/marketing/product-marketing/roles-personas/#rachel-release-manager) * [Parker (Product Manager)](https://about.gitlab.com/handbook/marketing/product-marketing/roles-personas/#parker-product-manager) * [Delaney (Development Team Lead)](https://about.gitlab.com/handbook/marketing/product-marketing/roles-personas/#delaney-development-team-lead) * [Sasha (Software Developer)](https://about.gitlab.com/handbook/marketing/product-marketing/roles-personas/#sasha-software-developer) * [Presley (Product Designer)](https://about.gitlab.com/handbook/marketing/product-marketing/roles-personas/#presley-product-designer) * [Devon (DevOps Engineer)](https://about.gitlab.com/handbook/marketing/product-marketing/roles-personas/#devon-devops-engineer) * [Sidney (Systems Administrator)](https://about.gitlab.com/handbook/marketing/product-marketing/roles-personas/#sidney-systems-administrator) * [Sam (Security Analyst)](https://about.gitlab.com/handbook/marketing/product-marketing/roles-personas/#sam-security-analyst) * [Dana (Data Analyst)](https://about.gitlab.com/handbook/marketing/product-marketing/roles-personas/#dana-data-analyst) * [Simone (Software Engineer in Test)](https://about.gitlab.com/handbook/marketing/product-marketing/roles-personas/#simone-software-engineer-in-test) * [Allison (Application Ops)](https://about.gitlab.com/handbook/marketing/product-marketing/roles-personas/#allison-application-ops) --> ### Further details Related to: - https://gitlab.com/gitlab-org/gitlab/issues/204826 - https://gitlab.com/gitlab-org/gitlab/-/issues/213628 - https://gitlab.com/gitlab-com/gl-infra/production/-/issues/1907 ### Proposal If we have https://gitlab.com/gitlab-org/gitlab/issues/204826 and https://gitlab.com/gitlab-org/gitlab/-/issues/213628 we could add a feature to GitLab admin UI which pauses indexing, creates a new index and trigger a re-index and then swap aliases when it's done. This would fully automate the processes in https://gitlab.com/gitlab-com/gl-infra/production/-/issues/1907. This could actually just be a single button called "reindex in cluster" which automates everything. It could show the task ID of the reindexing process and display whatever progress we can get from the task API. This feature would need to use a lock to avoid it being pressed twice. So once it is pressed and until it is completed the button will be disabled and the progress indicator will be displayed. This feature could be used by anyone that wants to roll out optional index settings changes that we release. One thing worth noting is that we've ran into several issues in the past with the reindex API not being very robust against errors. So we'd need to address this somehow. We can of course abort the operation if an error occurs but it's likely that we might need to provide configuration options to the reindex in order for it to work under all circumstances. Previous errors: 1. `search_context_missing_exception` => https://gitlab.com/gitlab-com/gl-infra/production/-/issues/1902#note_318813682 => possibly the [`scroll`](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html#docs-reindex-api-query-params) was timing out and maybe needed to be configured for a longer time 2. `Remote responded with a chunk that was too large. Use a smaller batch size.` => https://gitlab.com/gitlab-com/gl-infra/production/-/issues/1862#note_315666096 => needed a smaller batch [`source.size`](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html#docs-reindex-api-request-body) configured but it's not clear to me if these settings actually need to be tweaked unless you are reindexing to remote but this issue will only implement indexing within the same cluster. ### Permissions and Security <!-- What permissions are required to perform the described actions? Are they consistent with the existing permissions as documented for users, groups, and projects as appropriate? Is the proposed behavior consistent between the UI, API, and other access methods (e.g. email replies)?--> ### Documentation <!-- See the Feature Change Documentation Workflow https://docs.gitlab.com/ee/development/documentation/feature-change-workflow.html * Add all known Documentation Requirements in this section. See https://docs.gitlab.com/ee/development/documentation/feature-change-workflow.html#documentation-requirements * If this feature requires changing permissions, update the permissions document. See https://docs.gitlab.com/ee/user/permissions.html --> ### Availability & Testing We believe this warrants a new end to end UI test that * Starts an in-cluster reindex * Then makes sure the index button is locked * Waits for completion (How will we know it's done? Will the button become un-locked? Also, is there a way to tell via the API?) * Asserts a search completes properly. If we're able to trigger this via the API then we'll add an API test that: * Starts an in-cluster reindex * Verifies a second call to reindex replies with a sensible message like "reindexing in progress". * Waits for completion * Asserts an API search completes properly. Since this change only effects the Elasticsearch integration page UI and perhaps the API running the above test should suffice for QA, no need to verify with a full `package-and-qa` run. <!-- This section needs to be retained and filled in during the workflow planning breakdown phase of this feature proposal, if not earlier. What risks does this change pose to our availability? How might it affect the quality of the product? What additional test coverage or changes to tests will be needed? Will it require cross-browser testing? Please list the test areas (unit, integration and end-to-end) that needs to be added or updated to ensure that this feature will work as intended. Please use the list below as guidance. * Unit test changes * Integration test changes * End-to-end test change See the test engineering planning process and reach out to your counterpart Software Engineer in Test for assistance: https://about.gitlab.com/handbook/engineering/quality/test-engineering/#test-planning --> ### What does success look like, and how can we measure that? <!-- Define both the success metrics and acceptance criteria. Note that success metrics indicate the desired business outcomes, while acceptance criteria indicate when the solution is working correctly. If there is no way to measure success, link to an issue that will implement a way to measure this. --> ### What is the type of buyer? <!-- What is the buyer persona for this feature? See https://about.gitlab.com/handbook/marketing/product-marketing/roles-personas/buyer-persona/ In which enterprise tier should this feature go? See https://about.gitlab.com/handbook/product/pricing/#four-tiers --> ### Is this a cross-stage feature? <!-- Communicate if this change will affect multiple Stage Groups or product areas. We recommend always start with the assumption that a feature request will have an impact into another Group. Loop in the most relevant PM and Product Designer from that Group to provide strategic support to help align the Group's broader plan and vision, as well as to avoid UX and technical debt. https://about.gitlab.com/handbook/product/#cross-stage-features --> ### Links / references Reindex API: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html
issue