A flumeview implemented on top of level.
Provides indexes which are persistent and can be streamed in order.
const Flume = require('flumedb')
const FlumelogOffset = require('flumelog-offset')
const FlumeviewLevel = require('flumeview-level')
const pull = require('pull-stream')
const flumedb = Flume(FlumelogOffset('/tmp/log.offset'))
const name = 'foo'
flumedb.use(
name,
FlumeviewLevel(1, function map(value) {
return [value.foo] // must return an array
})
)
flumedb.append({ foo: 'bar' }, function (err) {
if (err) throw err
// Query items from the index directly
flumedb[name].get('bar', function (err, value) {
if (err) throw err
console.log(value) // => { foo: 'bar' }
})
// Or query ranges via pull-streams
pull(flumedb[name].read({ gte: 'bar', live: true }), pull.drain(console.log))
})
FlumeviewLevel(version, map) => function
version
The version of the view. Incrementing this number will cause the view to be re-built
map
A function with signature (value, seq)
, where value
is the item from the log coming past, and seq
is the location of that value in the flume log.
This function must return an Array that's either empty or contains unique index key(s).
These index keys can then be queired to retrieve the stored value (see get
and read
below).
Examples of index key(s) you might return:
[]
- i.e. don't add any indexes for this value
['@mix']
- make an index entry for this value under string @mix
['@mix', '@mixmix']
- make an index entries for this value under both @mix
AND @mixmix
[['@mix', 1524805117433]]
- make an index entry for this value under the key ['@mix', 1524805117433]
(anything can be a key in leveldb)This last case is useful when you might want multiple entries under a particular key like @mix
- if just use @mix
then the index will get overwritten by future values coming in with the same key.
Extending the key to include some unique aspect (like a timestamp or the seq
of the value) means you can have multiple indexes in your view which have a similar key.
e.g. flumeview-search is a flumeview which takes the text from incoming values and builds an index which can be searched.
It takes a sentence like "Learn about leveldb" and maps that into 3 index keys like ['learn', 'about', 'leveldb']
, each of which will point back to the sentence "Learn about leveldb".
In practice the 3 indexes need to be more unique if we don't want there to be only one index for learn
- e.g. [['learn', 145], ['about', 145], ['leveldb', 145]]
will mean we can later add an index ['learn', 2034]
and it will be distinct from ['learn', 145]
.
Here 145, 2034 are just unique numbers which keep in index unique - using seq or timestamp is common for this.
function
flumeview-level returns a function which follows the flumeview pattern, enabling it to be installed into a flumedb.
get(key, cb)
This is a method that gets attached to the flumedb after you install your flumeview (see example above).
The keys for the values in map
above would be '@mix'
, '@mixmix'
, or ['@mix', 1524805117433]
read(opts) => pull-stream
opts
is similar to a level db query (see level docs).
e.g.
{
live: true, // this is an addition to the classic query options of level
gte: '@mi', // gte = greater than or equal to
lt: undefined, // lt = less than
reverse: true,
keys: true,
values: true,
seqs: false,
}
If you've created indexes that are Arrays (quite likely), you need to understand how Arrays and other value are ordered by leveldb. This is because using leveldb is all about ordering keys so that you can do queries efficiently. Because of the way a log-structured-merge-tree works (what level is) it can read adjacent records quickly (with a single seek) but jumping around is not as fast. Read about the pattern of ordering of keys/ indexes flumeview-level uses here (actually uses charwise under the hood, but follows the bytewise spec).
Example of more advanced query:
{
gte: ['@mix', 1524720269458],
lte: ['@mix', undefined],
}
Assume this is an index where the keys are of the form [@mentions, timestamp], then this query will get all mentions which are _exactly_ '@mix', and happened more recently than 2018-04-27 5pm NZT (note
undefined` is the highest value in bytewise comparator)
If you wanted to get all mentions which started with @m
you could use:
{
gte: ['@m', null],
lt: ['@m~', undefined],
}
Here null
is the lowest value in the comparator, and the ~
is just a slightly unreliable hack to catch values below @m~
as ~
is quite a high character (e.g. above Z) for lexicographic ordering (there are higher characters but english people are less likely to type them, check ltgt to generate reliable limiting values).
Here's some lexographically ordered strings to help you catch the vibe: '@nevernever', '@m', '@manowar', '@ma~', '@mo', '@m~'
MIT