[Operate-first-users] OperateFirst MOC OpenShift cluster re-install scheduled for Monday 2021-03-08

Tom Coufal tcoufal at redhat.com
Wed Mar 3 14:44:36 EST 2021


Hello Operate First users,

We’re reaching out to you since you’re the users of the MOC OpenShift
cluster at cnv.massopen.cloud. This message is important to you whether you
are a direct user of the OpenShift cluster or an ODH/Kubeflow user.

As you may have noticed our cluster is facing unprecedented issues lately.
The overall cluster stability has been compromised since two weeks ago. We
have been continuously working with Red Hat support as well as OpenShift
engineers, though the investigation continues to yield little progress
towards a resolution.

We decided to tear down the cluster completely and install a new cluster on
the same hardware from scratch on Monday 2021-03-08. Expect all services to
be unavailable during that period, expect all data stored in the cluster to
be deleted. We will migrate all of the current workloads tracked in the
OperateFirst git-ops repositories as well as all the connected ArgoCD
projects to this new cluster.

The current cnv.massopen.cloud will cease to exist and the new cluster that
will be put in place instead of it will use a different name. It will be
called zero.massopen.cloud, please watch out for any external routes/URL
changes.

Please be advised, this new cluster will not be a production-grade cluster.
It will be intentionally treated as a test bed cluster that is expected to
be experimented with and prone to failures.

Additionally, we plan to roll out another cluster by the end of month
(March 2021). This cluster will serve as a more stable environment as a
community managed production-like cluster with documented SLOs but no SLAs
guaranteed.


Migrating your workloads

We do not plan on migrating any data for you, so we advise you to backup
any data you currently store in the MOC cluster. That includes PVC users,
S3/Ceph users, and databases.

If you are a PVC user we suggest you to either git commit the data to your
GitHub repositories from JupyterHub or consult the OpenShift documentation
on how to copy/download the data directly:

https://docs.openshift.com/container-platform/4.5/nodes/containers/nodes-containers-copying-files.html

If you are a S3/Ceph user, we advise you to migrate your data to a more
permanent and resilient S3 storage facility, like MOC Swift. For further
information on how you can get started, please consult this GitHub issue:

https://github.com/open-infrastructure-labs/ops-issues/issues/33


CVN cluster outage tracking

https://github.com/operate-first/SRE/issues/62

https://github.com/operate-first/support/issues/90

https://access.redhat.com/support/cases/#/case/02869943


Progress and tracking of the new cluster installation

https://github.com/operate-first/SRE/issues/108

The whole installation and setup process will be streamed and recorded to
our YouTube channel

https://www.youtube.com/channel/UCe87bwqlGoBQs2RvMQZ5_sg


The OperateFirst Ops team
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.massopen.cloud/pipermail/operate-first-users/attachments/20210303/226484d2/attachment.html>


More information about the Operate-first-users mailing list