PostgreSQL Magic

In my last post I demonstrated how stored procedures can boost performance in PostgreSQL. Today, I want to show you some more tricks that can come in handy when working with PostgreSQL databases.

In the following, I have put together some very useful PostgreSQL features, which would be hard to replicate in MySQL. You’ll learn about sequences, inserting and selecting data with one query, arrays, and user-defined data types. In addition, I’ll give you a couple hints on what else can be done with these approaches.

PostgreSQL Sequences

Let’s take a look at sequences first. In MySQL you get new primary keys by using the AUTO_INCREMENT feature. Oracle and PostgreSQL provide sequences instead, which offer nice control possibilities. For demonstration purposes, let’s create a new customer table:

clock_timestamp() , unlike now(), provides the actual current time and therefore is more accurate. While now() is called only once at the start of the statement execution, the value of clock_timestamp() changes during the execution and thus always provides the exact time for each table entry.

A fully controllable sequence with the name customer_id_seq is automatically created with the table:

Now the sequence starts with 100 and is incremented by 10 each time it is used. We can call the next value by using SELECT nextval('customer_id_seq'); which returns 100.

Although MySQL’s AUTO_INCREMENT feature offers similar possibilities to adjust increment size and offset, sequences in PostgreSQL on the other hand are not bound to tables. Sequences can be utilized whenever unique IDs are required, e.g. in case you need unique IDs across several tables, additional, globally unique IDs for orders, or even if all IDs within a database need to be globally unique—the latter being a rather extreme example.

In addition, you can call sequences in PostreSQL without inserting any data into the table. This can sometimes be useful, for example when you need a unique ID before you actually insert any data. In MySQL, there is no built-in support for anything of this sort.

Inserting and Selecting Data with one Query

Isn’t it annoying that, when using MySQL, you have to call the last_insert_id () function each time you want to get the newly generated primary key after you inserted an entry? PostgreSQL provides a solution to this:

Returns: 110 (Remember? We increased the increment size of newly generated sequence values.)

As you can see, it only requires one database call to insert and select data simultaneously. Wouldn’t it be nice to get other columns back as well? What about columns the database generates by default, e.g. created? No problem in PostgreSQL. We can even return those columns whose value we don’t know beforehand. Let’s insert two entries at once and define which columns should be returned:

Returns:

The created column is returned for each entry and the respective timestamps are accurate due to the usage of clock_timestamp().

Arrays

Arrays are one of the most useful features in PostgreSQL. They can be used in a variety of ways. Here are some examples of cases where they may be used:

  • Defining columns of a table
  • Storing a series of entries in a stored procedure
  • Inserting big chunks of data into a stored procedure
  • Comparisons
  • Returning data from database in one variable

Let’s play around with arrays:

array_agg(…) returns the rows in an array: {110,120,130}.

Inserting more than one entry into the customer table naturally creates more than one ID. So if you return IDs as described in the section above, several results will be returned. This is very useful when the database is called from application side, but in the case of stored procedures it is more convenient to get back an array instead. This can be achieved by using a WITH clause, which can be thought of as a temporary table that only exists for one query:

The temporary table inserted contains the returned IDs and we can use it to get back the IDs as an array: {140,150}.

As mentioned above, arrays can also be utilized for comparison. So instead of using the “normal” IN clause we might do the following: SELECT 1 = ANY(ARRAY[1,2,3]);, which returns true.

But there is a pitfall if you want to use negation: SELECT 4 != ANY(ARRAY[1,2,3]); returns true like you would expect. But SELECT 4 != ANY(ARRAY[1,2,3,4]); also returns true, because the array contains at least one entry which doesn’t equal 4. So ANY returns true correctly in this case, even though there is a four within the array.

If we want the statement to return false when the negation isn’t true for all values of an array, we need to use ALL:

SELECT 4 != ALL(ARRAY[1,2,3]); returns true.

SELECT 4 != ALL(ARRAY[1,2,3,4]); returns false as you would expect.

User-Defined Data Types

Especially regarding stored procedures it is very helpful to have combined data types. This can easily be done:

The created type can now be used like this:

::customer_type is a type cast which returns the user type variable (1,Testy5,Tester5,test5@project-a.com). If a variable within a stored procedure is defined as c customer_type;, you can set individual values with c.id := 1;. Of course, you can select data the same way by using SElECT c.id;, which returns 1.

Arrays reveal another feature when used in combination with UNNEST:

Returns:

As you can see, UNNEST extracts each entry of an array and converts it into a row. This is very useful in case you want to insert more than one entry.

Returns: {160,170}

The example above is not so different from what you already know, but you can use the same approach for updates too:

Returns: {160,170}

Combining Insert, Update, and Delete

All of the above approaches are useful in their own right. Their full potential, however, unfolds when used in combination. You can combine insert, update, and delete in a single database call. Moreover, you can use this combination not only for inserting single entries but even arrays of entries. All this is very valuable when handling databases and provides a lot of potential regarding performance.

How to do it? Well, I thought I’d leave that one for you to puzzle over. If you know a solution, leave a comment below and join us or one of our ventures’ IT teams! And if you don’t know how to do it, you can always still apply for a job with us and we’ll be happy to show you! 😉


Copyright 2015 Project A Ventures | All code in this post is licensed under the MIT License unless otherwise declared.

about the author

Flo has been our CTO with special expertise in Java from the very beginning. Additionally, he’s responsible for QA and system administration. He selects CTOs for our ventures and supports them with IT coaching. Before, Flo built companies like Edarling, Zalando, and Citydeal. He also worked for Mundwerk AG.
  • Very nice article! I would love to see more of the background / reasons why these features are useful and how they integrate into the bigger picture.

    • Jan Harasym

      The “returning” functionality is very handy for, example, websites which need to continue processing information with your user account after insertion. I actually had a situation where developers didn’t know about the “returning” functionality.
      which was a problem as they kept on mysql because they knew the function.

  • Nitpick. now() returns the start time of the current transaction. statement_timestamp() the start time of the current statement.

  • Raimundo Onetto

    Loved the article!!

    I’ll give it a shot … something like this??

    CREATE TEMP TABLE a (id integer, value text);
    CREATE TEMP TABLE b (id integer, value text);

    INSERT INTO a VALUES (1,’hi’),(2,’bye’);

    WITH
    updated AS (UPDATE a SET value =’hello’ WHERE id = 1 RETURNING *),
    inserted AS (INSERT INTO b SELECT * FROM updated RETURNING *)
    DELETE FROM a WHERE id NOT IN (SELECT id FROM inserted);

    • Flo

      Thanks for your comment! Here’s a hint: Assume you have an array of entries while each entry consists of an ID and a name. For new entries, the ID can be null. So you can perform a complete diff between the data in your table and your array. This can be done in one statement and it doesn’t require an additional table.

  • Flo

    Hi Drahflow, thanks for your comment! Your approach goes in the right
    direction, but the actual idea was to use an array as data source and
    perform a diff against the table (using insert, update, delete). Of course,
    this makes more sense when you’re dealing with complete user data and
    not just IDs in a real life scenario.

  • Flo

    Hi Redsmin, thanks for your suggestion! I tested the UUID approach. An
    indexed bigint column is approx. as fast as a column of type UUID, but
    the latter requires about 50% more space. A normal indexed text
    column, however, is twice as slow and requires 150% more space than the
    indexed bigint column.

    • Indeed, it definitely depends on your use-case (e.g. less-contention with UUIDs vs space-requirements)