The Most Amazing PostgreSQL Database

For me PostgreSQL is the most amazing (open source) database around. Even though, there is much interest in stripped down NoSQL databases like key-value stores or “data structure servers”, PostgreSQL continues to innovate at the SQL frontier.

In this post, I’ll show a few of the newer, less known, features of PostgreSQL - far beyond standard SQL.

hstore

hstore is a key-value store for simple data types. Using hstore, we’re able to create a key-value store within columns of a table.

To enable the hstore extension, run create extension hstore' in the PostgreSQL prompt. After that the hstore data type is available for our table definitions.

Let’s create a simple table with a hstore column:

create table hstoretest ( id serial primary key, data hstore );

To insert a few rows, we use a special syntax:

insert into hstoretest (data) values ('key1 => 123, key2 => "text"'::hstore);

Query the table as usual:

select * from hstoretest;

 id |                           data
----+-----------------------------------------------------------
  1 | "key1"=>"123", "key2"=>"text"

The hstore extension provides a lot of operators and functions to work with hstore columns, for example, selecting all key2 values:

select data -> 'key2' as key2 from hstoretest;

 key2
------
 text
(1 row)

Some more examples can be found here.

JSON

A JSON data type was introduced in release 9.2. Currently this is nothing more than a validating data type, thus it checks if the string we put into that column is a valid JSON object.

Let’s create a new table to play around with this type:

create table jsontest ( id serial primary key, data json );

Now let’s insert an invalid row:

insert into jsontest (data) values ('{"title":wrong}');

ERROR:  invalid input syntax for type json
LINE 1: insert into jsontest (data) values ('{"title":wrong}');
                                            ^
DETAIL:  Token "wrong" is invalid.
CONTEXT:  JSON data, line 1: {"title":wrong...

And now with the correct JSON syntax:

insert into jsontest (data) values ('{"title":"right"}');

There isn’t really much more to the JSON data type besides the ability to return rows of non-JSON tables as JSON:

select row_to_json(hstoretest) from hstoretest;

                                       row_to_json
-----------------------------------------------------------------------------------------
 {"id":1,"data":"\"key1\"=>\"123\", \"key2\"=>\"text\""}
(1 row)

Nice if you’re used to work with JSON object (in Web applications for example).

PLv8

Working directly with JSON and JavaScript has been all the rage in many of the NoSQL databases. Using the PLv8 extension, we can use JavaScript (executed by Google’s awesome V8 engine) directly in PostgreSQL. Together with the JSON data type, this offers amazing new possibilities.

Currently PLv8 isn’t included in the standard distribution of PostgreSQL (9.2), but installing it isn’t very hard, the only dependencies are postgresql and the v8 engine. Some distributions already have v8 in their repositories (Archlinux).

Compiling and installing the extension is straight forward as soon as the dependencies are in place:

make && sudo make install

No we can enable the plv8 extension within our database (as we did with hstore):

create extension plv8;

A particular nice example of using JSON and PLv8 comes from Andrew Dunstan:

create or replace function jmember (j json, key text )
 RETURNS text
 LANGUAGE plv8
 IMMUTABLE
AS $function$
  var ej = JSON.parse(j);
  if (typeof ej != 'object')
        return NULL;
  return JSON.stringify(ej[key]);
$function$;

The jmember function allows us to parse and read the JSON string and returns the member identified by key:

select jmember(data, 'title') from jsontest;

     jmember
-----------------
 "right"
(1 row)

Andrew also shows how to build an index to speed up access times in his post.

k-Nearest Neighbors

In PostgreSQL 9.1, a nearest neighbor indexing was introduced. This allows us to perform orderings etc. by a distance metric.

For example, I’ve downloaded the ispell spelling dictionaries, and loaded them into a table words like this:

create table words (word varchar(50) primary key);
copy words from 'english.0';

This inserts roughly 50000 words into the table.

Since we’re working with text data, let’s introduce another extension pg_trgm, which builds tri-grams of strings (triples of three characters). Using theses tri-grams, we can compute a distance metric. Enable the extension like this:

create extension pg_trgm;

A tri-gram of hello would look like this:

select show_trgm('hello');
            show_trgm
---------------------------------
 {"  h"," he",ell,hel,llo,"lo "}
(1 row)

The distance metric is very simple, the more of these tri-grams match, the closer two strings are.

To take advantage of the nearest neighbor index, we have to build it:

create index word_trgm_idx on words using gist (word gist_trgm_ops);

Using the index we can query our table for a word, and return a list of most similar terms as well:

select word, word <-> 'hello' as distance from words order by word <-> 'hello' asc limit 10;
  word  | distance
--------+----------
 hello  |        0
 hellos |    0.375
 hell   | 0.428571
 hells  |      0.5
 heller | 0.555556
 hell's | 0.555556
 help   |    0.625
 helm   |    0.625
 held   |    0.625
 helps  | 0.666667
(10 rows)

The <-> operator comes from the pg_trgm extension, of course we could use simpler distances like numerical difference or geometric distance, but working with textual data is often perceived as particularly difficult (not so with PostgreSQL).

Now Postgres is more or less the first choice of people with experience running production databases. It’s more powerful, more reliable, and has a better set of features than any other open source data storage layer out there today.
— Peter van Hardenberg, Tech Lead, Heroku Postgres

Comments

comments powered by Disqus