Skip to main content

Zero Downtime Migrations

Following page contains information about writing zero-downtime migrations. If you are interested in section about upgrading Saleor, please visit upgrading guide.

The problem

When migrating production databases we encountered a problem of long running data migrations causing a downtime. We needed to address this problem while also allowing us to create data migrations for new functionalities.

Solution

We wanted to provide a way to ensure we can safely migrate the database without interrupting the work of already running web workers. To achieve it we need to ensure that:

  • Old code is compatible with new database schema or new database schema is compatible with old code.
  • Migrations don't lock tables, rows etc. for no longer than one second.

Writing migrations

This paragraph covers the most common cases you can encounter while writing migrations. Please remember that shown examples are only to give you an idea of how specific migration should be written.

Adding a new table to the schema

Adding a new table to the schema is backward compatible with the old code. No additional steps are needed.

Adding a new field to the schema

To add a new field to the database, we must at least ensure one of the following:

  • field is nullable
  • field has a default value

Please note that Django doesn't propagate default values onto the database; that's why you need to add an SQL statement that would manually set the default value on existing rows:

operations = [
migrations.AddField(
model_name="transactionevent",
name="psp_reference",
field=models.CharField(blank=True, default="", max_length=512),
)

migrations.RunSQL(
"""
ALTER TABLE payment_transactionevent
ALTER COLUMN psp_reference
SET DEFAULT '';
""",
migrations.RunSQL.noop,
),
]

Adding a field that cannot be nullable and a default value cannot be added manually, needs additional steps. To add a new field in Saleor version X follow the diagram:

e.g. Adding new_field to Saleor version 3.14:

  • In version 3.13 add new_field with null=True.
  • In version 3.13 add logic in code that covers writing to new_field.
  • In version 3.13 add migration that will fill nullable values. Please check data migration
  • In version 3.14 change new_field to be nonnullable.

Renaming field

To rename a field you need to first add a new field with a new name and then migrate data between.

e.g. Rename field old_field_name to new_field_name in Saleor version 3.14:

  • In version 3.13 add a new field new_field_name.
  • In version 3.13 add logic in code that covers writing to both new_field_name and old_field_name.
  • In version 3.13 migrate data from old_field_name to new_field_name. Please check data migration
  • In version 3.14 update logic to use only new_field_name.
  • From version 3.14 remove old_field_name. Please check removing field.

Change field type without renaming

Similarly to renaming field you need to first add a new field, but there are differences.

e.g. Change field type of number from int to string in Saleor version 3.14:

  • In version 3.13 add a new field number_string.
  • In version 3.13 add logic in code that covers writing to both number and number_string.
  • In version 3.13 migrate data from number to number_string. Please check data migration
  • From version 3.14 remove number. Please check removing field
  • In version 3.15 add a new field number back.
  • In version 3.15 add logic in code that covers writing to both number and number_string.
  • In version 3.15 migrate data from number_string to number. Please check data migration
  • From version 3.16 remove number_string. Please check removing field

Please note, that the process can be simplified if we can stay with a new name for the field, in this example that would be number_string.

Removing field

To remove a field from the schema, you first need to remove it from ORM and then you can proceed with removal from the database.

e.g. Removing field old_field in Saleor version 3.14:

  • In version 3.14 drop the field from ORM and ensure that the field is nullable or has a default value:
operations = [
migrations.SeparateDatabaseAndState(
database_operations=[
migrations.AlterField(
model_name="example",
name="old_field",
field=models.CharField(
blank=True, null=True
),
),
],
state_operations=[
migrations.RemoveField(
model_name="example",
name="old_field",
),
],
)
]
Expand ▼
  • In version 3.15 drop the field from the database:
operations = [
migrations.SeparateDatabaseAndState(
database_operations=[
migrations.RunSQL(
sql="""
ALTER TABLE app_example
DROP COLUMN old_field;
""",
reverse_sql="""
ALTER TABLE app_example
ADD COLUMN old_field
VARCHAR(512);
"""
),
],
)
]

Adding the index to the database

Creating an index can lock the table for several hours. To avoid such a scenario you need to create an index using the concurrently option. Example:

from django.db import migrations
from django.db.models import Index
from django.contrib.postgres.operations import AddIndexConcurrently

class Migration(migrations.Migration):
atomic=False

operations = [
AddIndexConcurrently(
model_name="user",
index=Index(fields=["city_id"], name="account_user_city_id_index")
)
]

Please note that line atomic=False is needed to proceed with concurrent index creation. Please don't use nonatomic for other migrations.

Removing the index from the database

Similarly to adding the index to the database, removing the index also needs the concurrently option.

from django.db import migrations
from django.contrib.postgres.operations import RemoveIndexConcurrently

class Migration(migrations.Migration):
atomic=False

operations = [
RemoveIndexConcurrently(
model_name="user",
name="account_user_city_id_index"
)
]

Please note that line atomic=False is needed to proceed with concurrent index removal. Please don't use nonatomic for other migrations.

Adding foreign key to the database

Adding a ForeignKey in Django creates an index on that field which is not specified explicitly in the Model definition. You can check that behavior by running sqlmigrate command. To avoid creating the index or to be able to manually change creation to concurrently as stated in Adding the index to the database You need to disable the creation of the index by setting db_index=False and move the creation of the index to Meta.indexes, e.g.:

class OrderEvent(models.Model):
...
related = models.ForeignKey(
"self",
blank=True,
null=True,
on_delete=models.SET_NULL,
related_name="related_events",
db_index=False,
)

class Meta:
indexes = BTreeIndex(fields=["related"], name="order_orderevent_related_id_idx")

And with this, you can follow Adding the index to the database. Migration containing the ForeignKey creation should be separated from migration that creates an index concurrently.

Data migration

Data migration can lock the table for several hours. To avoid such a scenario you need to delegate logic to Celery worker. Celery task should be placed in saleor/<module_name>/migrations/tasks/saleor<X>.py where X means Saleor version, e.g. task to migrate data between orders in Saleor version 3.13 should be placed in saleor/order/migrations/tasks/saleor3_13.py

To call a task inside migration use post_migrate signal, e.g:

from django.db import migrations
from django.db.models.signals import post_migrate
from django.apps import apps as registry
from .tasks.saleor3_13 import migration_task


def migration(apps, _schema_editor):
def on_migrations_complete(sender=None, **kwargs):
migration_task.delay()

sender = registry.get_app_config("order")
post_migrate.connect(on_migrations_complete, weak=False, sender=sender)


class Migration(migrations.Migration):
dependencies = []

operations = [
migrations.RunPython(migration, migrations.RunPython.noop)
]
Expand ▼

The following conditions must be met regarding the migration task:

  • single task execution should not take longer than one second
  • task should be idempotent
  • task should check if there is data to be migrated (e.g. do not start calculations if the table is empty)
  • task should be concurrently safe (e.g. execution of the same task by multiple workers should not end in deadlock)
  • task should que yourself if there is still data to be processed
  • task should proceed with data from newest to oldest

e.g. from Saleor core code, task to change type in OrderEvent from transaction_void_requested to transaction_cancel_requested.

from ....celeryconf import app
from ...models import OrderEvent

from django.db import transaction
from django.db.models import QuerySet

# batch size to make sure that task is completed in 1 second and as well we don't use too much memory
BATCH_SIZE = 5000


def update_type_to_transaction_cancel_requested(qs: QuerySet[OrderEvent]):
with transaction.atomic():
# lock the batch of objects, to avoid deadlocks
_events = list(qs.select_for_update(of=(["self"])))
qs.update(type="transaction_cancel_requested")


@app.task
def order_events_rename_transaction_void_events_task():
# Order events proceed from the newest to the oldest
events = OrderEvent.objects.filter(type="transaction_void_requested").order_by(
"-pk"
)
ids = events.values_list("pk", flat=True)[:BATCH_SIZE]
qs = OrderEvent.objects.filter(pk__in=ids)

# If we found data, queue next execution of the task
if ids:
update_type_to_transaction_cancel_requested(qs)
order_events_rename_transaction_void_events_task.delay()
Expand ▼

Task Autodiscovery in Zero Downtime Migrations

In the context of zero downtime migrations, especially for data migrations, tasks might be scattered across different modules or might be version-specific. To ensure that Celery picks up all necessary tasks for execution without missing any, we use the concept of "Additional Autodiscovery with Specified Related Names."

This method of autodiscovery allows for targeted discovery of tasks from specific packages or modules, ensuring that migration tasks specific to certain versions or modules are correctly discovered and executed. For instance modify file saleor/celeryconf.py:

app.autodiscover_tasks(
packages=[
"saleor.order.migrations.tasks",
],
related_name="saleor3_15",
)