Databases

Last Updated: 2022-02-01

This document is a collection of the guidelines Grouparoo uses when dealing with Models and Databases internally.

Grouparoo Migration Guidelines

  • No sequential IDs. We use app-generated unique IDs for primary keys (UUIDs) because we are working with large datasets and databases sometimes can't count (I'm looking at you, Redshift!).
  • Every table gets id, createdAt and updatedAt, even if it seems silly now... we will need it eventually!
  • No enums in the database, use VARCHAR(191). Migrating an enum is too hard and database-specific. Of course, we can use ENUMs in the Models.
  • No default values for strings. Instead, Define defaults in the model. If you want to add a default value to a non-null column, you can accomplish this with a 2-part migration: first add the column with a default value (to update all the existing rows), and then remove the default.
  • Do not re-specify default values. For example, setting a default value of null on a nullable string column is redundant.
  • No foreign key constraints in the database... do these checks in the model. But, it's probably a good idea to add a search index to these
  • All guids/uuids are 40 characters long - a real V4 UUID (36 characters) + 3 letter prefix + underscore, ie: app_168c4564-e389-4fbd-8338-db04d62022ba
  • Do not use compound indexes for SQLite. There's a Seqeuelize bug
    • Instead, do the uniqueness validation in the model in a @BeforeSave hook
    • You can continue to add compound indexes for Postgres (do so with a check against config.seqeulize.dialect === 'postgres')
  • All migrations should be within a transation. If part of the migration fails, the whole migration will be rolled back.

Grouparoo Model Guidelines

  • No private methods. This breaks compatibility between Typescript-exported model definitions (to JS) and the TS files themselves. We require this compatibility as plugins both source and export to core... so a single runtime's require path may be core/dist -> plugin/src or even core/dist -> plugin/dist -> core/src
    • Instead, Pretend it's 1999 and prefix methods you want to be private with _, i.e. async _privateMethod()
  • Do not re-specify default values. For example, setting a default value of null on a nullable string column is redundant (eg: @Default(null)).
  • New models need to be enumerated a few places:
    • Index.ts
    • SpecHelper.ts
    • modules/plugin.ts
  • In order to keep models smaller, we make the distinction between the Model itself and ModelOps
    • Table/Column definitions, Class methods, Setters, Getters, and Validations belong on the Model
    • Methods that mutate the Model (or more than one Model), or interact with a Plugin, belong in a ModelOp
    • It is appropriate for convenience to have a Model method call out to a ModelOp (ie: Record.import => RecordOpt.import(record))

Querying

  • As we are working with larger volumes of data, we cannot assume that our querying in batches (via limit & offset) won't change out from under us. To that end, using offset is a code smell. We should instead be ordering by some non-changing column (id, createdAt, etc) and storing the highest value as a highWaterMark for the next subsequent query.
-- bad, what if updated_at changed for some of the users as we query?
select * from users order by updated_at asc limit 100 offset 100

-- good
select * from users order by uuid asc limit 100;
-- next time
select * from users where uuid >= previousValue order by uuid asc limit 100;
  • Do not order by rand() or random(). It's very slow on large datasets. There are other alternatives possible, but may require more than one query.

Avoid Circular Dependencies

  • If you are loading a model in an initializer in the @grouparoo/core project, you need to load it from the file's direct export, not @grouparoo/core's main export. ie: import {App} from '../models/App' not import {App} from '../App'.