Working With Documents
Around the late 00s, document databases like MongoDB or CouchDB started to be hailed as the next big thing. Modeling information in isolated hierarchies rather than as flat related sets, they made an enormous difference in how applications approached data. Although they were never going to replace relational databases, there are many use cases for which document stores are undeniably better-suited than the traditional RDBMS, so document databases are an important class of datastore in their own right.
PostgreSQL's JSONB functionality means there's no need to choose. Unstructured information can be captured in JSONB fields belonging to structured tables, blending the relational and document approaches. There's a lot to recommend a hybrid data model: in modern applications, a greater or lesser proportion of the data is relational and should be stored and accessed as such, but there's often some information that really is best represented as a rich, nested document with a flexible schema. In pure relational terms, data like this can only be represented through tortuous abstractions (if you've never seen a table with a self-joining foreign key representing a hierarchy, consider yourself fortunate) or as unsearchable, unindexable long text or BLOB fields.
The JSONB type is a great solution to this problem, and Massive takes care of the management overhead with a document table API.
Document Tables
Document tables exist for the sole purpose of storing JSONB data. You don't even need to create them until you know you need them, with db.saveDoc
and db.saveDocs
capable of creating them at runtime.
The *Doc
methods detailed here are intended to be the primary means of interaction with document tables. However, standard table functions still work on document tables, and can be quite useful especially when you've added supplemental fields to them! Criteria objects with the regular find
and other methods should use JSON traversal to specify conditions for body.myField.anArray[1].aField
.
findDoc
is still preferred to JSON queries if at all possible since it can take advantage of indexing to speed up equality searches with the @>
"contains" operator.
Primary Key Default Data Type
The default primary key data type used for all tables including document tables is serial
, which ordinarily begins at 1 for the first record created and increments by 1 for each subsequent record. However, it is possible to change the primary key data type used for new document tables to uuid
(Universally Unique Identifier).
The uuid-ossp
Postgres extension must be enabled for UUID keys.
A UUID is a string of 32 hexadecimal digits in five character groups, separated by hyphens. It's commonly used in concurrent or distributed environments, or for extremely large or high-churn tables where the limits on integer size become a concern. There are multiple "versions" of UUIDs, differentiated by the algorithm used to generate them.
Version | Description |
---|---|
v1 |
Partially deterministic, based on the server's MAC address. |
v1mc |
Variation on v1 which uses a random MAC address. |
v2 |
Unsupported. |
v3 |
Based on a namespace and value, and MD5 hashed. Currently unsupported. |
v4 |
The default version, fully pseudorandom. |
v5 |
Like v3 , but hashed with |
If documentPkType
is set in the Massive configuration object but uuidVersion
is left unspecified, the default is v4
. If partitioning or indexing document primary key values, v1mc
or v1
is recommended.
const db = await massive(connectionInfo, {
// set UUID primary keys for document tables
documentPkType: 'uuid',
// use v1mc generation
uuidVersion: 'v1mc'
});
Customizing Document Table Columns
Document tables may be extended with new columns and foreign keys. The id
type can be changed as well (so long as a default is set, such as uuid_generate_v1mc()
or uuid_generate_v4()
etc. for UUID types) without impeding usage of document table functions. Just don't remove any columns or change their names, since Massive depends on those.
If you use a migration framework or just want to generate your schema ahead of time, it's easiest to just copy the DDL out of the document table script. Fill in the placeholder tokens:
Name | Value |
---|---|
schema | Name of the schema owning the document table. |
table | Name of the document table. |
index | Document tables created through Massive use "schema_table", but you can fill in anything you like. |
pkType | Either SERIAL or UUID . |
pkDefault | (Omit if SERIAL pkType) one of the UUID generating functions. |
db.saveDoc
The connected database instance has a saveDoc
function. Passed a collection name (which can include a non-public schema) and a JavaScript object, this will create the document table if it doesn't already exist and write the object to it.
// create the reports table, if it doesn't exist, and
// add an initial document. It's returned with the new
// primary key value filled in.
const report = await db.saveDoc('reports', {
title: 'Week 12 Throughput',
lines: [{
name: '1 East',
numbers: [5, 4, 6, 6, 4]
}, {
name: '2 East',
numbers: [4, 4, 4, 3, 7]
}]
});
If the table already exists, you can still use db.saveDoc
, but you can also invoke saveDoc
on the table itself.
saveDocs
can be used for saving multiple documents. The documents being saved must be all new, or all existing.
const newBooks = await db.saveDocs('books', [
{title: 'The Grapes of Wrath'},
{title: 'The Jungle'}
]);
Querying Documents
A Note About Criteria
Document criteria can be a little more complex to work with. Massive will attempt to use the "contains" (@>
) operator against the index on the body
field to speed up document queries, but this is only possible when equality is the only comparison operation being invoked. If the criteria use other operations (including IN
), it's back to doing things the somewhat slower way. See the example criteria below:
// Simple equality always uses the index.
let docs = await db.docs.findDoc({
field1: 'value'
});
// Matching nested values exactly also uses the index,
// including searching arrays.
docs = await db.docs.findDoc({
objectfield: {
innervalue: 123
},
arrayfield: [{
'match': 'an object in arrayField'
}]
});
// Non-equality operations do _not_ use the index.
let docs = await db.docs.findDoc({
'field !=': 'value'
});
// `IN` also precludes using the index.
docs = await db.docs.findDoc({
'field1': [1, 2, 3]
});
Be careful with criteria which cannot use the index since they may result in poorly-performing queries with sufficiently large tables.
countDoc
Like its counterpart, countDoc
returns the number of extant documents matching a criteria object. Unlike count
, countDoc
does not accept a raw SQL WHERE
definition.
const count = await db.reports.countDoc({
'title ilike': '%throughput%'
});
findDoc
findDoc
locates documents with a criteria object or primary key.
const report = await db.reports.findDoc(1);
const throughput = await db.reports.findDoc({
'title ilike': '%throughput%'
});
searchDoc
searchDoc
performs a full-text search on the document body, similar to search
. You can specify fields
, or omit them in order to search the entire document.
const kauaiDocs = await db.reports.searchDoc({
fields: ["title", "description"],
term: "Kauai"
});
Persisting Documents
saveDoc
saveDoc
inserts or updates a document, like save
inserts or updates records in ordinary tables. If an id
field is present in the document you pass, the corresponding record will be updated; otherwise, it's inserted.
There is one important distinction in how the two methods operate: saveDoc
overwrites the entire document on updates! If you pass an incomplete document in, that's what gets persisted -- fields you don't specify will be gone. To modify documents without overwriting non-specified fields, see updateDoc
.
saveDoc
yields the updated document.
const report = await db.reports.saveDoc({
title: 'Week 12 Throughput',
lines: [{
name: '1 East',
numbers: [5, 4, 6, 6, 4]
}, {
name: '2 East',
numbers: [4, 4, 4, 3, 7]
}]
});
saveDocs
saveDocs
inserts or updates multiple documents, which must be all new or all existing.
const newBooks = await db.books.saveDocs([
{
title: 'The Grapes of Wrath',
author: 'John Steinbeck'
} , {
title: 'The Jungle',
author: 'Upton Sinclair'
}
]);
updateDoc
updateDoc
adds and updates fields in an existing document or documents without replacing the entire body. Fields not defined in the changes
object are not modified. updateDoc
requires an ID or criteria object and a changes object. Options may be passed as the third argument.
updateDoc
may be used to alter values in any JSON or JSONB column, not just with document tables, by setting options.body
. However, this setting has further effects. Without options.body
, criteria are tested against the document body as with other document methods, and the function yields the updated document(s). With a different options.body
set, criteria will be tested against the row as with other table methods, and the function yields the affected row or rows.
const updatedReport = await db.reports.updateDoc(1, {
title: 'Week 11 Throughput'
});
// Since products is not a document table (note the
// 'info' field was specified to update), the 'type'
// is tested against a column named type rather than
// a key in the info JSON or JSONB column, and the
// function yields products instead of info documents.
const updatedProducts = await db.products.updateDoc({
type: 'widget'
}, {
colors: ['gray', 'purple', 'red']
}, {
body: 'info'
});