The Most Amazing PostgreSQL Database

For me PostgreSQL is the most amazing (open source) database around. Even though, there is much interest in stripped down NoSQL databases like key-value stores or “data structure servers”, PostgreSQL continues to innovate at the SQL frontier.

In this post, I’ll show a few of the newer, less known, features of PostgreSQL - far beyond standard SQL.

hstore

hstore is a key-value store for simple data types. Using hstore, we’re able to create a key-value store within columns of a table.

To enable the hstore extension, run create extension hstore' in the PostgreSQL prompt. After that the hstore data type is available for our table definitions.

Let’s create a simple table with a hstore column:

create table hstoretest ( id serial primary key, data hstore );

To insert a few rows, we use a special syntax:

insert into hstoretest (data) values ('key1 => 123, key2 => "text"'::hstore);

Query the table as usual:

select * from hstoretest;

 id |                           data
----+-----------------------------------------------------------
  1 | "key1"=>"123", "key2"=>"text"

The hstore extension provides a lot of operators and functions to work with hstore columns, for example, selecting all key2 values:

select data -> 'key2' as key2 from hstoretest;

 key2
------
 text
(1 row)

Some more examples can be found here.

JSON

A JSON data type was introduced in release 9.2. Currently this is nothing more than a validating data type, thus it checks if the string we put into that column is a valid JSON object.

Let’s create a new table to play around with this type:

create table jsontest ( id serial primary key, data json );

Now let’s insert an invalid row:

insert into jsontest (data) values ('{"title":wrong}');

ERROR:  invalid input syntax for type json
LINE 1: insert into jsontest (data) values ('{"title":wrong}');
                                            ^
DETAIL:  Token "wrong" is invalid.
CONTEXT:  JSON data, line 1: {"title":wrong...

And now with the correct JSON syntax:

insert into jsontest (data) values ('{"title":"right"}');

There isn’t really much more to the JSON data type besides the ability to return rows of non-JSON tables as JSON:

select row_to_json(hstoretest) from hstoretest;

                                       row_to_json
-----------------------------------------------------------------------------------------
 {"id":1,"data":"\"key1\"=>\"123\", \"key2\"=>\"text\""}
(1 row)

Nice if you’re used to work with JSON object (in Web applications for example).

PLv8

Working directly with JSON and JavaScript has been all the rage in many of the NoSQL databases. Using the PLv8 extension, we can use JavaScript (executed by Google’s awesome V8 engine) directly in PostgreSQL. Together with the JSON data type, this offers amazing new possibilities.

Currently PLv8 isn’t included in the standard distribution of PostgreSQL (9.2), but installing it isn’t very hard, the only dependencies are postgresql and the v8 engine. Some distributions already have v8 in their repositories (Archlinux).

Compiling and installing the extension is straight forward as soon as the dependencies are in place:

make && sudo make install

No we can enable the plv8 extension within our database (as we did with hstore):

create extension plv8;

A particular nice example of using JSON and PLv8 comes from Andrew Dunstan:

create or replace function jmember (j json, key text )
 RETURNS text
 LANGUAGE plv8
 IMMUTABLE
AS $function$
  var ej = JSON.parse(j);
  if (typeof ej != 'object')
        return NULL;
  return JSON.stringify(ej[key]);
$function$;

The jmember function allows us to parse and read the JSON string and returns the member identified by key:

select jmember(data, 'title') from jsontest;

     jmember
-----------------
 "right"
(1 row)

Andrew also shows how to build an index to speed up access times in his post.

k-Nearest Neighbors

In PostgreSQL 9.1, a nearest neighbor indexing was introduced. This allows us to perform orderings etc. by a distance metric.

For example, I’ve downloaded the ispell spelling dictionaries, and loaded them into a table words like this:

create table words (word varchar(50) primary key);
copy words from 'english.0';

This inserts roughly 50000 words into the table.

Since we’re working with text data, let’s introduce another extension pg_trgm, which builds tri-grams of strings (triples of three characters). Using theses tri-grams, we can compute a distance metric. Enable the extension like this:

create extension pg_trgm;

A tri-gram of hello would look like this:

select show_trgm('hello');
            show_trgm
---------------------------------
 {"  h"," he",ell,hel,llo,"lo "}
(1 row)

The distance metric is very simple, the more of these tri-grams match, the closer two strings are.

To take advantage of the nearest neighbor index, we have to build it:

create index word_trgm_idx on words using gist (word gist_trgm_ops);

Using the index we can query our table for a word, and return a list of most similar terms as well:

select word, word <-> 'hello' as distance from words order by word <-> 'hello' asc limit 10;
  word  | distance
--------+----------
 hello  |        0
 hellos |    0.375
 hell   | 0.428571
 hells  |      0.5
 heller | 0.555556
 hell's | 0.555556
 help   |    0.625
 helm   |    0.625
 held   |    0.625
 helps  | 0.666667
(10 rows)

The <-> operator comes from the pg_trgm extension, of course we could use simpler distances like numerical difference or geometric distance, but working with textual data is often perceived as particularly difficult (not so with PostgreSQL).

Now Postgres is more or less the first choice of people with experience running production databases. It’s more powerful, more reliable, and has a better set of features than any other open source data storage layer out there today.
— Peter van Hardenberg, Tech Lead, Heroku Postgres

Git Overview

My version control journey started with CVS, after that I looked at SVN, but never really used it. The shortcomings of centralized repositories were too obvious and with my increasing interest in Haskell I jumped on the distributed version control train with Darcs. I really, really liked it, but it had some nasty things too. After a while I was looking for something different and stumbled over Mercurial, again I was really happy with it but somehow my journey wasn’t over yet.

Continue to full post...

Mercurial Version Control Status in the ZSH Command Line Prompt

I sometimes forget to push or pull changes to or from a remote repository. To remedy the problem I wrote myself a little script to show me the status on the prompt.

Continue to full post...

Emacs Basics

It’s been a while since I wrote my Vim Introduction and Tutorial (exactly one year). A lot happened between now and then, I chose to get a better feeling about Emacs for example.

The reasons aren’t easily explained; The most prominent reason is the awesome AucTex-mode since I’m working heavily with LaTeX lately.

Anyways, learning Vim and Emacs is better than learning only one of them :-).

Continue to full post...

Addendum to "Time Machine for every Unix out there"

My article about using rsync to mimic the behavior of Apple’s Time Machine generated a lot of traffic, and more important, a lot of feedback.

In this article I’ll summarize and try to clarify a few things.

Continue to full post...

Time Machine for every Unix out there

rsync is one of the tools that have gradually infiltrated my day to day tool-box (aside Vim and Zsh).

Using rsync it’s very easy to mimic Mac OS X new feature called Time Machine. In this article I’ll show how to do it, but there is still a nice GUI missing – for those who like it shiny.

Continue to full post...

Linux peripheral devices configuration

This weekend I tried to get my new keyboard (Microsoft Ergonomic Keyboard 4000) and my Logitech MX1000 Laser mouse to work properly. The Keyboard has many extra-keys I didn’t bother to count, and the mouse has 12 buttons which can be very useful at times.

Almost accidently I solved a bugging performance problem with the Firefox browser. It was incredibly slow when opening Google Spreadsheets, well the whole system was incredibly slow while loading the spreadsheet… .

Continue to full post...

Vim Introduction and Tutorial

I often tried to learn the great Emacs editor/IDE/operating system. The last time I tried it, I spent some time getting comfortable with it until I wanted to customize my .emacs file.

That was the point when I entered vi .emacs. As soon as I realized what I’ve done, I knew that Vim has won me over a long time ago.

Continue to full post...

Haskell - Laziness

Haskell is a non-strict, or lazy, language. This means it evaluates expressions only when it needs their results.

Laziness is one of the things that make Haskell special – really special. Lazy evaluation allows easy handling of infinite data-structures.

Continue to full post...

Haskell Basics

The article I wrote yesterday was just the beginning, today we’ll look at the next step in becoming Haskell experts.

Yesterday we’ve learned how to split up our program and how to compile, or run it. Today we’ll look at some basic features of Haskell.

Continue to full post...

Haskell

Danger! If you are happy with your current knowledge of programming languages, don’t read on – Haskell might be responsible for some serious defects in your motivation.

I was always some kind of programming language geek. I loved learning and playing around with all kinds of programming languages1. Currently I’m trying to learn Haskell. Even though I learned functional programming at my university (two semesters using Haskell), I didn’t really learn how to interact with the outside world. This is where things start to get messy, no matter how beautiful the language is.

Continue to full post...

Using Google's Blogsearch

I’m using hobix to power my blog. Unfortunately it requires some work to get the dynamic components up and running (search and comments).

So I decided to use Google’s Blogsearch which is pretty powerful.

Continue to full post...

Darcs

Common version control tasks, and how they’re done using Darcs, a fresh approach to version control.

Continue to full post...

Installing SQLite, Lighttpd, FastCGI and Ruby bindings on Mac OS X

I’ve had a hard time installing SQLite, Lighttpd, FastCGI together with Ruby bindings on OS X, so I thought I could share my experiences.

Continue to full post...