Working With Documents

Around the late 00s, document databases like MongoDB or CouchDB started to be hailed as the next big thing. Modeling information in isolated hierarchies rather than as flat related sets, they made an enormous difference in how applications approached data. Although they were never going to replace relational databases, there are many use cases for which document stores are undeniably better-suited than the traditional RDBMS, so document databases are an important class of datastore in their own right.

PostgreSQL's JSONB functionality means there's no need to choose. Unstructured information can be captured in JSONB fields belonging to structured tables, blending the relational and document approaches. There's a lot to recommend a hybrid data model: in modern applications, a greater or lesser proportion of the data is relational and should be stored and accessed as such, but there's often some information that really is best represented as a rich, nested document with a flexible schema. In pure relational terms, data like this can only be represented through tortuous abstractions (if you've never seen a table with a self-joining foreign key representing a hierarchy, consider yourself fortunate) or as unsearchable, unindexable long text or BLOB fields.

The JSONB type is a great solution to this problem, and Massive takes care of the management overhead with a document table API.

Document Tables
Querying Documents
Persisting Documents

Document Tables

Document tables exist for the sole purpose of storing JSONB data. You don't even need to create them until you know you need them, with db.saveDoc and db.saveDocs capable of creating them at runtime.

The *Doc methods detailed here are intended to be the primary means of interaction with document tables. However, standard table functions still work on document tables, and can be quite useful especially when you've added supplemental fields to them! Criteria objects with the regular find and other methods should use JSON traversal to specify conditions for body.myField.anArray[1].aField.

findDoc is still preferred to JSON queries if at all possible since it can take advantage of indexing to speed up equality searches with the @> "contains" operator.

Primary Key Default Data Type

The default primary key data type used for all tables including document tables is serial, which ordinarily begins at 1 for the first record created and increments by 1 for each subsequent record. However, it is possible to change the primary key data type used for new document tables to uuid (Universally Unique Identifier).

The uuid-ossp Postgres extension must be enabled for UUID keys.

A UUID is a string of 32 hexadecimal digits in five character groups, separated by hyphens. It's commonly used in concurrent or distributed environments, or for extremely large or high-churn tables where the limits on integer size become a concern. There are multiple "versions" of UUIDs, differentiated by the algorithm used to generate them.

Version	Description
`v1`	Partially deterministic, based on the server's MAC address.
`v1mc`	Variation on `v1` which uses a random MAC address.
`v2`	Unsupported.
`v3`	Based on a namespace and value, and MD5 hashed. Currently unsupported.
`v4`	The default version, fully pseudorandom.
`v5`	Like `v3`, but hashed with SHA-1 instead. Currently unsupported.

If documentPkType is set in the Massive configuration object but uuidVersion is left unspecified, the default is v4. If partitioning or indexing document primary key values, v1mc or v1 is recommended.

const db = await massive(connectionInfo, {
  // set UUID primary keys for document tables
  documentPkType: 'uuid',

  // use v1mc generation
  uuidVersion: 'v1mc'
});

Customizing Document Table Columns

Document tables may be extended with new columns and foreign keys. The id type can be changed as well (so long as a default is set, such as uuid_generate_v1mc() or uuid_generate_v4() etc. for UUID types) without impeding usage of document table functions. Just don't remove any columns or change their names, since Massive depends on those.

If you use a migration framework or just want to generate your schema ahead of time, it's easiest to just copy the DDL out of the document table script. Fill in the placeholder tokens:

Name	Value
schema	Name of the schema owning the document table.
table	Name of the document table.
index	Document tables created through Massive use "schema_table", but you can fill in anything you like.
pkType	Either `SERIAL` or `UUID`.
pkDefault	(Omit if `SERIAL` pkType) one of the UUID generating functions.

db.saveDoc

The connected database instance has a saveDoc function. Passed a collection name (which can include a non-public schema) and a JavaScript object, this will create the document table if it doesn't already exist and write the object to it.

// create the reports table, if it doesn't exist, and
// add an initial document. It's returned with the new
// primary key value filled in.
const report = await db.saveDoc('reports', {
  title: 'Week 12 Throughput',
  lines: [{
    name: '1 East',
    numbers: [5, 4, 6, 6, 4]
  }, {
    name: '2 East',
    numbers: [4, 4, 4, 3, 7]
  }]
});

If the table already exists, you can still use db.saveDoc, but you can also invoke saveDoc on the table itself.

saveDocs can be used for saving multiple documents. The documents being saved must be all new, or all existing.

const newBooks = await db.saveDocs('books', [
  {title: 'The Grapes of Wrath'},
  {title: 'The Jungle'}
]);

Querying Documents

A Note About Criteria

Document criteria can be a little more complex to work with. Massive will attempt to use the "contains" (@>) operator against the index on the body field to speed up document queries, but this is only possible when equality is the only comparison operation being invoked. If the criteria use other operations (including IN), it's back to doing things the somewhat slower way. See the example criteria below:

// Simple equality always uses the index.
let docs = await db.docs.findDoc({
  field1: 'value'
});

// Matching nested values exactly also uses the index,
// including searching arrays.
docs = await db.docs.findDoc({
  objectfield: {
    innervalue: 123
  },
  arrayfield: [{
    'match': 'an object in arrayField'
  }]
});

// Non-equality operations do _not_ use the index.
let docs = await db.docs.findDoc({
  'field !=': 'value'
});

// `IN` also precludes using the index.
docs = await db.docs.findDoc({
  'field1': [1, 2, 3]
});

Be careful with criteria which cannot use the index since they may result in poorly-performing queries with sufficiently large tables.

countDoc

Like its counterpart, countDoc returns the number of extant documents matching a criteria object. Unlike count, countDoc does not accept a raw SQL WHERE definition.

const count = await db.reports.countDoc({
  'title ilike': '%throughput%'
});

findDoc

findDoc locates documents with a criteria object or primary key.

const report = await db.reports.findDoc(1);

const throughput = await db.reports.findDoc({
  'title ilike': '%throughput%'
});

searchDoc

searchDoc performs a full-text search on the document body, similar to search. You can specify fields, or omit them in order to search the entire document.

const kauaiDocs = await db.reports.searchDoc({
  fields: ["title", "description"],
  term: "Kauai"
});

Persisting Documents

saveDoc

saveDoc inserts or updates a document, like save inserts or updates records in ordinary tables. If an id field is present in the document you pass, the corresponding record will be updated; otherwise, it's inserted.

There is one important distinction in how the two methods operate: saveDoc overwrites the entire document on updates! If you pass an incomplete document in, that's what gets persisted -- fields you don't specify will be gone. To modify documents without overwriting non-specified fields, see updateDoc.

saveDoc yields the updated document.

const report = await db.reports.saveDoc({
  title: 'Week 12 Throughput',
  lines: [{
    name: '1 East',
    numbers: [5, 4, 6, 6, 4]
  }, {
    name: '2 East',
    numbers: [4, 4, 4, 3, 7]
  }]
});

saveDocs

saveDocs inserts or updates multiple documents, which must be all new or all existing.

const newBooks = await db.books.saveDocs([
  {
    title: 'The Grapes of Wrath',
    author: 'John Steinbeck'
  } , {
    title: 'The Jungle',
    author: 'Upton Sinclair'
  }
]);

updateDoc

updateDoc adds and updates fields in an existing document or documents without replacing the entire body. Fields not defined in the changes object are not modified. updateDoc requires an ID or criteria object and a changes object. Options may be passed as the third argument.

updateDoc may be used to alter values in any JSON or JSONB column, not just with document tables, by setting options.body. However, this setting has further effects. Without options.body, criteria are tested against the document body as with other document methods, and the function yields the updated document(s). With a different options.body set, criteria will be tested against the row as with other table methods, and the function yields the affected row or rows.

const updatedReport = await db.reports.updateDoc(1, {
  title: 'Week 11 Throughput'
});

// Since products is not a document table (note the
// 'info' field was specified to update), the 'type'
// is tested against a column named type rather than
// a key in the info JSON or JSONB column, and the
// function yields products instead of info documents.
const updatedProducts = await db.products.updateDoc({
  type: 'widget'
}, {
  colors: ['gray', 'purple', 'red']
}, {
  body: 'info'
});