• Postgres pivot table using JSON

    Something I have a need to do often but can be difficult to do at times in SQL is to create a pivot table. As an example imagine wanting to see customers and their revenue by month. It is straightforward to create a normal data set where the dates are the rows and you have a revenue amount for each. Something like this:

    dte total
    2022-01-01 22030
    2022-02-01 22753
    2022-03-01 0
    2022-04-01 9456
    2022-05-01 7798
    2022-06-01 38278
    2022-07-01 18736
    2022-08-01 6794
    2022-09-01 21033
    2022-10-01 28576
    2022-11-01 10172
    2022-12-01 41901

    But you quickly come up to two obstacles as you try to take it further - you either want to have the months as columns like this:

    jan feb mar apr may jun jul aug sep oct nov dec
    22030 22753 0 9456 7798 38278 18736 6794 21033 28576 10172 41901

    or you want to see multiple customers, which as a column can be difficult, or even harder is having the months as columns and the customers as rows:

    cus_id jan feb mar apr may jun jul aug sep oct nov dec
    1 0 10170 0 5399 0 14821 7927 0 14 15466 3675 14447
    2 22030 12583 0 4057 7798 23457 10809 6794 21019 13110 6497 27454

    The term for this is pivot table - which is something you may have done many times in Excel or other spreadsheet application.

    But this is difficult in SQL, because SQL requires you to have a static column list. You can’t ask SQL to give you whatever columns are necessary, you must declare them in your query. (SELECT * may seem like an exception to this rule, but in this case the SQL engine still knows what the columns are going to be before the query is executed).

    Luckily Postgres gives you a way around this. If you want actual columns you still have to specify them, but the hard part of aggregation into these pivoted columns and rows are made much easier. The key is using the JSON functionality, which allows you to represent complex values in a single cell. It allows you to aggregate the values into what really represents multiple values, and then pull them back apart after the fact.

    Here is an example of what this looks like:

    WITH columns AS (
      SELECT
           generate_series dte
         , customer_id
      FROM generate_series('2022-01-01'::date, '2022-12-31', '1 month')
      CROSS JOIN
        (
          SELECT DISTINCT
            customer_id
          FROM
            invoice
        ) AS customers
    ), data AS (
      SELECT
          columns.dte
        , columns.customer_id
        , SUM(COALESCE(invoice.amount, 0)) total
      FROM columns
      LEFT OUTER
        JOIN invoice
        ON DATE_PART('year', invoice_date) = DATE_PART('year', columns.dte)
        AND DATE_PART('month', invoice_date) = DATE_PART('month', columns.dte)
        AND columns.customer_id = invoice.customer_id
      GROUP BY
          columns.dte
        , columns.customer_id
    ), result AS (
      SELECT
        customer_id
        , JSONB_OBJECT_AGG(TO_CHAR(dte, 'YYYY-MM'), total) pivotData
      FROM
        data
      GROUP BY
        customer_id
    )
    SELECT
        customer_id
      , (pivotData->>'2022-01') "jan"
      , (pivotData->>'2022-02') "feb"
      , (pivotData->>'2022-03') "mar"
      , (pivotData->>'2022-04') "apr"
      , (pivotData->>'2022-05') "may"
      , (pivotData->>'2022-06') "jun"
      , (pivotData->>'2022-07') "jul"
      , (pivotData->>'2022-08') "aug"
      , (pivotData->>'2022-09') "sep"
      , (pivotData->>'2022-10') "oct"
      , (pivotData->>'2022-11') "nov"
      , (pivotData->>'2022-12') "dec"
    FROM
      result
    ORDER BY customer_id
    

    Which gives results like we were after above.

    If you would like to see how this is done, I have an interactive fiddle you can play with that shows you step by step how each of these parts work:

    https://dbfiddle.uk/?rdbms=postgres_14&fiddle=39e115cb8afd6e62c0101286ecd08a3f

    This example is using PG 15, but this functionality works all the way back to PG 9.5.

    This query is also a great example of using generate_series to generate a set of data to join against, so that you can find any holes and represent all the data points you need (months in this case), even if there is no actual data for that point.

    In conclusion, the JSON functionality built into todays relational databases are more than just schemaless data stores and complex values in cells. They can also be powerful as intermediate steps to help you manipulate and transform your data in useful ways.

    comments

  • Disconnected API Responses

    An alternate title for this post might be Processing API Requests with a Queue. We recently had a project where we were expecting a burst of high traffic and heavy load on some API endpoints. We wanted to make sure that we could handle all of the traffic, even if the processing time was affected - dropped requests were not an option. After doing quite a bit of research this post is what we came up with. In the end this strategy worked well for our purposes, but we did identify some ways that we would improve it in the future.

    Also this same strategy will work for any long-running api request. Things like reporting for example, where you need to be able to make a request but it may take a very long (and possibly indeterminate) amount of time to complete. Please forgive the tone of the document, it is being adapted from my notes and it isn’t in conversational form. It’s likely that there may be some things that I should expand upon, so if anything needs clarification please feel free to ask in the comments.

    Read more

    comments

  • Postgres numeric overflow error with json

    Recently I came across an error using postgres that stumped me for a while so I wanted to document it for next time. I was issuing an update statement to a table that had no numeric columns, but received the error: ERROR: value overflows numeric format. Not only did my statement not affect any numeric columns, the table itself didn’t have any numeric columns. The actual problem ended up being some malformed json that I was trying to insert into a jsonb column. The json had a value like 300e715100 which was actually part of a hashed string, but the json serializer I was using incorreclty identified it as a very large scientific notiation number, and so did not quote it. Because postgres cannot deal with a number that large it throws the error. Quoting the value properly fixed the problem. I also want to note that the error would not happen with json, only with jsonb because postgres is actually parsing the document.

    select '{"v": 300e715100}'::jsonb;
    -- ERROR:  value overflows numeric format
    -- LINE 1: select '{"v": 300e715100}'::jsonb
    
    select '{"v": 300e715100}'::json;
    -- this statement executes without error.

    You can try the code and see the error for yourself here: http://dbfiddle.uk/?rdbms=postgres_9.6&fiddle=1584330f148ab0e9ed72529dfb466a12

    comments

  • Postgres update using subselect

    Today I learned that postgres allows you to use a subselect in an update statement using a special syntax. This allows you to update a record from other data in the system easily (without remembering the weird update-with-join syntax) or to have an update syntax that more closely resembles an insert statement. For example:

    UPDATE table
    SET (foo, bar) =
    (SELECT 'foo', 'bar')
    WHERE id = 1;

    The subselect can be any query, just return your columns in the same order as the column list you provide. Here is the relevant documentation if you want to read further: https://www.postgresql.org/docs/current/static/sql-update.html

    comments

  • Map, Reduce and other Higher Order Functions

    There are few things I have learned in my programming career that have paid off like higher order functions. Map, Reduce and Filter with their cousins, along with the concept of passing functions as data in general make code easier to reason about, easier to write, easier to test. I find myself evangelizing these concepts often, so I thought I would try to do my best to give an introduction of them, along with some real world examples of how they can improve your everyday programming life. These examples are in JavaScript, but the concepts are universal.

    Read more

    comments