Database support
This document provides details and examples on how to implement database support for a Linux package component. The architecture blueprint provides the design and definitions.
Level 1
- Add the new database-related configuration attributes to
gitlab.rb. Do not forget to updategitlab.rb.template. - Update the Chef recipe to consume the configuration attributes. At this level, the requirement is to pass down the attributes to the component, generally its through configuration files or command-line arguments.
For example in registry cookbook:
-
registry['database']attribute is added togitlab.rb(seeattributes/default.rb). - The configuration template uses the attribute to configure registry (see
templates/default/registry-config.yml.erb).
Level 2
-
Add dependency to
postgresqlandpgbouncercookbooks. Usedependsinmetadata.rb. This ensures that requirements are met and their Chef custom resources are available to the cookbook. -
Create a
database_objectscustom resource inresources/directory of the cookbook with the defaultnothingaction (a no-op action) and acreateaction. Thecreateaction can leverage the existingpostgresqlcustom resources to set up the required database objects for the component.See: -
postgresql_user-postgresql_database-postgresql_schema-postgresql_extension-postgresql_queryA
database_objectsresource of must create all of the required database objects for a component. It must not assume that another cookbook creates some of the objects that it needs. -
Create a
database_migrationscustom resource inresources/directory of the cookbook with the defaultnothingaction (a no-op action) and arunaction. Therunaction executes the commands for database migrations of the component.When the migration runs, it can safely assume that all of the required database objects are available. Therefore this resource depends on successful
createaction ofdatabase_objectsresource. -
In the
defaultrecipe of the cookbook, use adatabase_objectsresource that notifies adatabase_migrationsresources torun. The migrations should be able to runimmediatelyafter the preparation of database objects but a component may choose not to use the immediate trigger.
Examples
All of the following code blocks are provided as examples. You may need to make adjustments to ensure that they meet your requirements.
Example 1: Registry database objects
The following example shows a database_objects resource in the registry
cookbook defined in registry/resources/database_objects.rb.
Notice how it uses custom resources from the postgresql cookbook to create
the required database objects.
# registry/resources/database_objects.rb
unified_mode true
property :pg_helper, [GeoPgHelper, PgHelper], required: true, sensitive: true
default_action :nothing
action :nothing do
end
action :create do
host = node['registry']['database']['host']
port = node['registry']['database']['port']
database_name = node['registry']['database']['database_name']
username = node['registry']['database']['username']
password = node['registry']['database']['password']
postgresql_user username do
password "md5#{password}" unless password.nil?
action :create
end
postgresql_database database_name do
database_socket host
database_port port
owner username
helper new_resource.pg_helper
action :create
end
postgresql_extension 'btree_gist' do
database database_name
action :enable
end
end
Example 2: Registry database migrations
The following example shows a database_migrations resource in the registry
cookbook defined in registry/resources/database_objects.rb.
Notice how the resource accepts additional parameters. Parameters help
support different migration scenarios, for example separation of pre-deployment
or post-deployment migrations. It also uses MigrationHelper
to decide whether to run a migration or not.
# registry/resources/database_migrations.rb
unified_mode true
property :name, name_property: true
property :direction, Symbol, default: :up
property :dry_run, [true, false], default: false
property :force, [true, false], default: false
property :limit, [Integer, nil], default: nil
property :skip_post_deployment, [true, false], default: false
default_action :nothing
action :nothing do
end
action :run do
# MigrationHelper is not implemented. It contains general-purpose helper
# methods for managing migrations, for example if a specific component
# migrations can run or not.
#
# See: "Helper class for general database migration requirements"
migration_helper = MigrationHelper.new(node)
account_helper = AccountHelper.new(node)
logfiles_helper = LogfilesHelper.new(node)
logging_settings = logfiles_helper.logging_settings('registry')
bash_hide_env "migrate registry database: #{new_resource.name}" do
code <<-EOH
set -e
LOG_FILE="#{logging_settings[:log_directory]}/db-migrations-$(date +%Y-%m-%d-%H-%M-%S).log"
umask 077
/opt/gitlab/embedded/bin/registry \
#{new_resource.direction} \
#{"--dry-run" if new_resource.dry_run} \
#{"--limit #{new_resource.limit}" unless new_resource.limit.nil?} \
... \
#{working_dir}/config.yml \
2>& 1 | tee ${LOG_FILE}
STATUS=${PIPESTATUS[0]}
chown #{account_helper.gitlab_user}:#{account_helper.gitlab_group} ${LOG_FILE}
exit $STATUS
EOH
not_if { migration_helper.run_migration?('registry') }
end
end
Example 3: Use database objects and migrations of Registry
The resources defined in the previous examples are used in
registry/recipes/enable.rb recipe.
See how only_if and not_if guards are used to decide when to create the
database objects or run the migrations. Also, pay attention to the way that
notifies is used to show the dependency of the migrations on the successful
creation of database objects.
# registry/recipes/enable.rb
# ...
pg_helper = PgHelper.new(node)
registry_database_objects 'default' do
pg_helper pg_helper
action :create
only_if { node['registry']['database']['enable'] }
not_if { pg_helper.replica? }
notifies :create, 'registry_database_migrations[up]', :immediately if pg_helper.is_ready?
end
registry_database_migrations 'up' do
direction :up
only_if { node['registry']['database']['enable'] }
not_if { pg_helper.replica? }
end
# ...
Example 4: Parameterized database objects resource for Rails
The following example shows how a single implementation of the database objects for Rails application can satisfy the requirements of the decomposed database model.
In this example the logical database is passed as the resource name and is
used to lookup settings of each database from the configuration. The settings
are passed to postgresql custom resources. This is particularly useful when
the majority of implementation of can be reused to replace two or more resources.
# gitlab/resources/database_objects.rb
unified_mode true
property :pg_helper, [GeoPgHelper, PgHelper], required: true, sensitive: true
default_action :nothing
action :nothing do
end
action :create do
global_database_settings = {
# ...
port: node['postgresql']['port']
host: node['gitlab']['gitlab_rails']['db_host']
# ...
}
database_settings = node['gitlab']['gitlab_rails']['databases'][new_resource.resource_name]
database_settings = global_database_settings.merge(database_settings) unless database_settings.nil?
username = database_settings[:username]
password = database_settings[:password]
database_name = database_settings[:database_name]
host = database_settings[:host]
port = database_settings[:port]
postgresql_user username do
password "md5#{password}" unless password.nil?
action :create
end
postgresql_database database_name do
database_socket host
database_port port
owner username
helper new_resource.pg_helper
action :create
end
postgresql_extension 'btree_gist' do
database database_name
action :enable
end
end
And this is how it is used in gitlab/recipes/default.rb:
gitlabe_database_objects 'main' do
pg_helper pg_helper
action :create
only_if { node['gitlab']['gitlab_rails']['databases']['main']['enable'] ... }
end
gitlabe_database_objects 'ci' do
pg_helper pg_helper
action :create
only_if { node['gitlab']['gitlab_rails']['databases']['ci']['enable'] ... }
end
Level 3
-
Add a new attribute for PgBouncer user. Make sure that this attribute is mapped to the existing
pgbouncer['databases']setting and can consume it. This attribute is used to create a dedicated PgBouncer user for the component as opposed to reusing the existing Rails user, the same as what Praefect currently does.NOTE Note: It is very important that we do not introduce any breaking changes to
gitlab.rb. The current user settings must work without any change. -
Use
pgbouncer_usercustom resource frompgbouncercookbook to create the dedicated PgBouncer user for the component. Use the attribute that is described in the previous step.
Level 4
-
Add a new attribute for the component to specify the name of the Consul service of the database cluster. This is either the name of the scope of the Patroni cluster (when automatic service registry for Patroni, i.e.
patroni['register_service'], is enabled) or the name of the Consul service that is configured manually without the Linux package. -
Use
database_watchcustom resource(Needs Implementation) to define a new Consul watch for the database cluster service. It notifies PgBouncer to update the logical database endpoint when the leader of the cluster changes. Pass the name of the Consul service, logical database, and any other PgBouncer options as parameters to the watch.
All database_watch resources must be placed in the consul cookbook. As
opposed to the previous levels, this is the only place where database-related
resources are concentrated in one cookbook, consul, and not managed in
the same cookbooks as their associated components.
The reason for this exception is that the watches run on the PgBouncer nodes,
where pgbouncer_role is used. All components, except PgBouncer and Consul,
are disabled. Note that this is in line with existing user configuration since
it is the recommended configuration for PgBouncer node.
We don’t want to introduce any breaking changes into gitlab.rb.
Considerations
-
No other resource should be involved with database setup.
-
All custom resources must be idempotent. For example they must not fail when an object already exist even though they are created or ran in another cookbook. Instead they must be able to update the current state of the database objects, configuration, or migrations based on the new user inputs.
-
In HA mode, given that multiple physical nodes are involved, the Linux package may encounter certain limitations to provide full automation of the configuration. This is an acceptable limitation.
Bridge the gap
Currently not all of the custom resources or helper classes are available. Even if they are, they may require some adjustments to meet the specified requirements. Here are some examples.
Reorganize the existing database operations
This model requires some adjustments in postgresql, patroni, and gitlab
cookbooks. For example database_objects that is defined in gitlab cookbook
must be used in the same cookbook and its usage must be removed from postgresql
and patroni cookbooks.
The database service cookbooks (postgresql and patroni) should not deal with
database objects and migrations and must delegate them to the application
cookbooks (e.g. gitlab, registry, and praefect). However, to support this,
custom resources of postgresql cookbook must be able to work on any node.
Currently they assume they run on the PostgreSQL node and use the UNIX socket to
connect to the database server. This assumption forces to place all database
operations in one cookbook.
The same is true for pgbouncer cookbook. Currently the only PgBouncer user is
created in the users recipe of this cookbook. This can change as well to allow
each component cookbook to create its own PgBouncer users.
Support dedicated PgBouncer user for databases
The current pgbouncer cookbook mostly supports multiple databases.
The pgbouncer cookbook only creates PgBouncer users for the main Rails database. This
is why non-Rails applications connect with the same PgBouncer user created for Rails.
We can currently set up PgBouncer support for decomposed Rails databases sharing the same user. But for Praefect or Registry, we need additional work to create dedicated PgBouncer users.
NOTE Note: A shared user does not mean connection settings for each database must be the same. It only means that multiple databases use the same user for PgBouncer connection.
Delay the population of PgBouncer database configuration
The implementation of gitlab-ctl pgb-notify supports multiple
databases. It is generic enough that, as long as the PgBouncer users are created,
it can update databases.ini from the databases.json file.
However, when PgBouncer users are pulled into individual cookbooks, the initial
databases.ini that is created or updated in gitlab-ctl reconfigure may not
be valid because it references PgBouncer users that are not created yet.
We should be able to fix this by delaying the action on Chef resource that calls
gitlab-ctl pgb-notify.
Configurable Consul watch for databases
Consul cluster can be shared between multiple Patroni clusters (using different
scopes, such as patroni['scope']), but updating PgBouncer configuration is still
problematic because the Consul watch scripts are not fully configurable.
The current implementation has several limitations:
-
The Linux package uses
postgresqlservice that is explicitly defined inconsulcookbook. This service, that is currently being used to notify PgBouncer, is a leftover of the transition from RepMgr to Patroni. It must be replaced with the Consul service that Patroni registers. Whenpatroni['register_service']is enabled Patroni registers a Consul service withpatroni['scope']parameter and the tagmaster,primary,replica, orstandby-leaderdepending on the node’s role. -
The current failover script is associated to a Consul watch for
postgresqlservice and is not capable of handling multiple databases because database name can not be changed.
We need to extend the current Linux package capability to use Consul watches to track Patroni services, find cluster leaders, and notify PgBouncer with a parameterized failover script.
In order to do this we implement database_watch custom resource in consul
cookbook. It defines a database-specific Consul watch for database a cluster
service and passes the required information to a parameterized failover script
to notify PgBouncer. The key attributes of this resource are:
-
The service name, that specifies which database cluster must be watched. It could be the scope of the Patroni cluster when
patroni['register_service']is enabled or a Consul service name when it is manually configured. -
The database name that specifies which logical databases should be reconfigured when the database cluster leader changes.
Helper class for general database migration requirements
MigrationHelper(Needs implementation) implements general
requirements of database migrations, including the central switch for enabling
or disabling auto-migrations. It can also provides the mapping between the
existing and new configuration attributes.