Working With Documents

Around the late 00s, document databases like MongoDB or CouchDB started to be hailed as the next big thing. Modeling data in groups of hierarchies rather than as flat related sets, they made an enormous difference in how applications approached data. Although they were never going to replace relational databases, there are many use cases for which document stores are undeniably better-suited than the traditional RDBMS, so document databases are an important class of datastore in their own right.

PostgreSQL's JSONB functionality allows us to blend relational and document approaches by storing JSON documents in traditional tables. There's a lot to recommend a hybrid data model: in modern applications, a greater or lesser proportion of the data is relational and should be stored and accessed as such, but there's often some information that really is best represented as a rich, nested document with a flexible schema. In pure relational terms, data like this can only be represented through tortuous abstractions (if you've never seen a table with a self-joining foreign key representing a hierarchy, consider yourself fortunate) or as unsearchable, unindexable long text or BLOB fields.

The JSONB type is a great solution to this problem, and Massive takes care of the management overhead with a document table API.

Document Tables

Document tables exist for the sole purpose of storing JSONB data. Query them through Massive's API, and you get a JSON document which you can modify and persist, all seamlessly. You don't even need to create them ahead of time until you know you need them.

Standard table functions still work on document tables, and can be quite useful especially for extended document tables! Fields in the document can be searched with regular find and criteria object fields using JSON traversal to look for body.myField.anArray[1].aField.

findDoc is still preferred to JSON queries if at all possible since it uses the @> "contains" operator to leverage indexing on the document body to improve performance.

Primary Key Default Data Type

The default primary key data type used for all tables including document tables is serial, which ordinarily begins at 1 for the first record created and increments by 1 for each subsequent record. However, it is possible to change the primary key data type used for new document tables to uuid (Universally Unique Identifier).

The database must have the uuid-ossp extension installed to use UUID keys.

UUID is a string of 32 hexadecimal digits in five character groups, separated by hyphens. It is commonly used in concurrent or distributed environments, or for extremely large or high-churn tables where the limits on integer size become a concern. There are multiple "versions" of UUIDs, differentiated by the algorithm used to generate them.

Version Description
v1 Partially deterministic, based on the server's MAC address.
v1mc Variation on v1 which uses a random MAC address.
v2 Unsupported.
v3 Based on a namespace and value, and MD5 hashed. Currently unsupported.
v4 The default version, fully pseudorandom.
v5 Like v3, but hashed with SHA-1 instead. Currently unsupported.

If documentPkType is set in the Massive configuration object but uuidVersion is left unspecified, the default is v4. If partitioning or indexing document primary key values, v1mc or v1 is recommended.

massive(connectionInfo, {
  // set UUID primary keys for document tables
  documentPkType: 'uuid',

  // use v1mc generation
  uuidVersion: 'v1mc'
}).then(instance => {...});

Customizing Document Table Columns

Document tables may be extended with new columns and foreign keys. The id type can be changed as well (so long as a default is set, such as uuid_generate_v1mc() or uuid_generate_v4() etc. for UUID types) without impeding usage of document table functions. Just don't remove any columns or change their names, since Massive depends on those.

If you use a migration framework or just want to generate your schema ahead of time, it's easiest to just copy the DDL out of the document table script. Fill in the placeholder tokens:

Name Value
schema Name of the schema owning the document table.
table Name of the document table.
index Document tables created through Massive use schema_table, but you can fill in anything you like.
pkType Either SERIAL or UUID.
pkDefault (Omit if SERIAL pkType) one of the UUID generating functions.

db.saveDoc

The connected database instance has a saveDoc function. Passed a collection name (which can include a non-public schema) and a JavaScript object, this will create the table if it doesn't already exist and write the object to it.

db.saveDoc('reports', {
  title: 'Week 12 Throughput',
  lines: [{
    name: '1 East',
    numbers: [5, 4, 6, 6, 4]
  }, {
    name: '2 East',
    numbers: [4, 4, 4, 3, 7]
  }]
}).then(report => {
  // the reports table has been created and the initial
  // document is assigned a primary key value and
  // returned
});

If the table already exists, you can still use db.saveDoc, but you can also invoke saveDoc on the table itself.

saveDocs can be used for saving multiple documents. The documents being saved must be all new, or all existing.

db.saveDocs('books', [
  {title: 'The Grapes of Wrath'},
  {title: 'The Jungle'}
]).then(console.log);

Querying Documents

A Note About Criteria

Document criteria can be a little more complex to work with. When possible, Massive tries to use a "contains" (@>) operator in order to leverage the index on the document body. Example criteria objects are below.

// A criteria object testing top-level keys will use
// the index
db.docs.findDoc({
  field1: 'value',
  'field2 !=': value
});

// Matching nested values exactly also uses the index,
// although combining multiple top-level conditions
// like this is less efficient. Note that operations
// cannot be used with the inner values -- only equality!
db.docs.findDoc({
  objectfield: {
    innervalue: 123
  },
  arrayfield: [{
    'match': 'inside members of arrayfield'
  }]
});

// Testing values with IN does _not_ use the index
db.docs.findDoc({
  'field1 IN': [1, 2, 3]
});

// Traversal for operations does _not_ use the index
db.docs.findDoc({
  'outer.inner <>': 'nested value'
});

Be careful with criteria which cannot use the index since they may result in poorly-performing queries with sufficiently large tables.

countDoc

Like its counterpart, countDoc returns the number of extant documents matching a criteria object. Unlike count, countDoc does not accept a raw SQL WHERE definition.

db.reports.countDoc({
  'title ilike': '%throughput%'
}).then(reports => {
  // number of matching documents
});

findDoc

findDoc locates documents with either a criteria object or a primary key. Simple criteria objects (testing equality only) can leverage the GIN index on the table to improve query speed.

db.reports.findDoc(1).then(report => {
  // the report document body with the primary key
  // included
});

db.reports.findDoc({
  'title ilike': '%throughput%'
}).then(reports => {
  // all report documents matching the criteria
});

searchDoc

searchDoc performs a full-text search on the document body. You can specify fields, or omit in order to use the stored search vector to search the entire document.

db.reports.searchDoc({
  fields: ["title", "description"],
  term: "Kauai"
}.then(docs => {
  // reports returned with an on-the-fly full text search
  // for 'Kauai'
});

Persisting Documents

saveDoc

saveDoc inserts or updates a document, like save inserts or updates records in ordinary tables. If an id field is present in the document you pass, the corresponding record will be updated; otherwise, it's inserted.

There is one important distinction in how the two methods operate: saveDoc overwrites the entire document on updates! If you pass an incomplete document in, that's what gets persisted -- fields you don't specify will be gone. To modify documents without overwriting non-specified fields, see updateDoc.

saveDoc returns a promise for the updated document.

db.reports.saveDoc({
  title: 'Week 12 Throughput',
  lines: [{
    name: '1 East',
    numbers: [5, 4, 6, 6, 4]
  }, {
    name: '2 East',
    numbers: [4, 4, 4, 3, 7]
  }]
}).then(report => {
  // the newly created report
});

saveDocs

saveDocs inserts or updates a list of documents (must be all new or all existing).

db.books.saveDocs([
  {
    title: 'The Grapes of Wrath',
    author: 'John Steinbeck'
  } , {
    title: 'The Jungle',
    author: 'Upton Sinclair'
  }
]).then(books => {
  // handle the list of saved books
});

updateDoc

updateDoc adds and updates fields in an existing document or documents without replacing the entire body. Fields not defined in the changes object are not modified. updateDoc requires an ID or criteria object and a changes object, with optional options.

updateDoc may be used to alter values in any JSON or JSONB column, not just with document tables. However, if the JSON column name is overridden by passing options.body, there is an important change in behavior. Criteria are normally applied against the document body as with other document methods; however, when a new body is specified, criteria will be tested against the row as with other table methods. Likewise, the promise returned will be for the updated document with a document table, or for the entire row when updateDoc is invoked against another table.

db.reports.updateDoc(1, {
  title: 'Week 11 Throughput'
}).then(report => {
  // the updated report, with a changed 'title' attribute
});

db.products.updateDoc({
  type: 'widget'
}, {
  colors: ['gray', 'purple', 'red']
}, {
  body: 'info'
}).then(widgets => {
  // an array of widgets, now in at least three colors;
  // since products is not a document table (note the
  // 'info' field was specified to update), the 'type'
  // is tested against a column named type rather than
  // a key in the info JSON or JSONB column.
});