When building a multi-tenant application there’s one key database structure decision that needs to be made; how will I keep tenant data separate from each other?
Introduction
Tenant data separation is a fundamental principle, you really don’t want one tenant to be able to read another’s data!
There are three well known approaches you can employ: -
- Database
- Schema
- Row Level
We’ll take a look at each and discuss their merits, finishing up with what we think is the best choice.
1. Database
Putting each tenant’s data into their own database ensures there can be no leaking of other tenant data in everyday queries and provides the highest level of separation of the three choices. Backup and restore is straightforward and performance is typically good as the query planner is dealing with a much smaller dataset.
There are some disadvantages though - you will need to build a new database each time a tenant signs up for your service and you won’t be able to make use of database connection pooling meaning you’ll either need to establish and close the connection on every request or hold a giant pool of connections open in memory (hint: this is a very bad idea). Performance and error monitoring becomes harder too as you’ll need to amalgamate logging and metrics across all databases to see an overall picture.
Typically the overhead of maintaining hundreds if not thousands of individual databases will become too great, both practically and from an infrastructure overhead perspective, and when you need to run migrations, these will have to run once per database.
We would only recommend this method of tenant data separation in very special circumstances.
2. Schema
Database schemas subdivide a single database and have many of the advantages of having separate databases and negate some of the disadvantages. Using this approach, each tenant has their own separate schema that contains their tables and data with separation enforced with a single command before each request, setting the schema PostgreSQL should work with.
Working with a single database makes things like connection pooling and performance monitoring much simpler whilst retaining the ability to restore individual tenant datasets as and when needed. You will, however, still need to build a new schema each time a tenant signs up for your service.
There are, however, some scaling issues when using this approach. Once you grow beyond a few hundred schemas tools like pg_dump start performing very slowly, to the point of being unusable. This can make regular database backups using these tools problematic, however, managed services like AWS RDS will backup multi-schema databases just fine (even with thousands of schemas).
With a large number of schemas, database migrations will become more challenging as they won’t fit inside a single database transaction and can take a lengthy period of time to complete. You will need to carefully evaluate each migration and ensure your production code can handle a mixed scenario where some schemas have been migrated and others not.
There’s also a very problematic issue when you have a large number of schemas with memory exhaustion due to PostgreSQL not having any constraints on metadata cache memory usage - we did a deep dive into this issue here.
Whilst the challenges noted above aren’t trivial, they can be overcome. We have worked on Ruby on Rails apps that have thousands of schemas and work well.
3. Row Level
Using a single database and schema, this approach places an ID column in each applicable table so that tenant rows are stamped with their unique id. Adding a where clause to each query ensures you only work on the applicable tenant’s rows. This requires some diligence on the developer’s part to ensure that all queries are correctly scoped to the correct tenant’s data.
This approach also means you don’t need to build databases or schemas each time a new tenant signs up, connection pooling is still in play, and addresses the issues around backups, migrations and metadata cache memory usage experienced when using schemas. It does lack the ability to easily restore individual tenant data using PostgreSQL tools - that’s something you’ll need to implement with your own code should you need it. As all of your tenant data is contained within a single set of tables, you will need to spend time on index optimisation, additionally ensuring that all tenant ID columns are indexed to prevent unwelcome table scanning.
The primary security challenge with row level separation is the need to ensure every query has the appropriate tenant id query. Missing this crucial piece from the query will result in all tenants data being returned - something you never want to happen. With the advent of PostgreSQL 9.5, native Row Level Security was introduced which brings a much more robust methodology to this type of tenant data separation. RLS can apply the tenant id to every query, automatically enforcing tenant data separation at the database. As with schema separation, the developer will need to issue a single command to set the current tenant before proceeding with any database queries, whilst also remembering to reset it at the end. This is easily achieved using an around action in your application controller.
There are a number of excellent articles that look at RLS in detail, here are a couple of our favourites: -
https://pganalyze.com/blog/postgres-row-level-security-ruby-rails
Conclusion
So, which would we choose? Our primary go-to would be Row Level separation using PostgreSQL’s built-in Row Level Security. It offers the best mix of quality of life/performance features and gives us the data separation assurances we need.
Schema level separation would still be a consideration if there were particular needs that meant RLS was unsuitable, and we’d only ever go down the road of database separation in very unique circumstances.
Get in touch
Are you building a multi-tenant application and looking for some advice on your database structure? With over 25 years experience in creating software applications, we can offer businesses a great amount of valuable insight and knowledge when it comes to the set up, structure and maintenance of your system. Contact us today about booking in a software consultancy day or to discuss your project requirements further. You can email us on hello@circle-sd.com or fill in our contact form, we’d love to hear from you.