PostgreSQL performance

Yesterday I was playing with Documentum/PostgreSQL installed into Docker container trying to either prove or refute the following statement from release notes:

Actually, it is not clear how EMC got such “promising” results, because all my knowledge about PostgreSQL tells me that the statement about “15% performance gap” sounds too optimistic. Let’s explain.

First of all, I do not want to put any blame on PostgreSQL – it is just a database which plays in second league (for the record: Oracle plays in premier league, MSSQL plays in first league), and I’m not going to explain thoroughly why I think so, but the basic idea is PostgreSQL completely lacks diagnostics and backup and recovery tools, and tools which exist are a piece of dog crap – example below demonstrates how “explain analyze” lies about execution time:

--
-- I create 5Gb table in database and PostgreSQL reports that 
-- it is able to read all data from this table in 1 ms.
-- Fuck yeah, PostgreSQL turned my MBP into a supercomputer
--
postgres=# create table t1 as select lpad('x',100,'x') as a1, 
postgres-#                  (SELECT array_to_string(ARRAY(SELECT chr((65 + round(random() * 25)) :: integer) 
postgres(#                    FROM generate_series(1,1024*1024)), '')
postgres(#                  ) as a2 from generate_series(1,5*1024);
SELECT 5120
postgres=# explain analyze select a2 from t1;
                                             QUERY PLAN                                              
-----------------------------------------------------------------------------------------------------
 Seq Scan on t1  (cost=0.00..145.20 rows=5120 width=18) (actual time=0.023..0.758 rows=5120 loops=1)
 Planning time: 3.140 ms
 Execution time: 0.963 ms
(3 rows)

postgres=# explain analyze select a2 from t1;
                                             QUERY PLAN                                              
-----------------------------------------------------------------------------------------------------
 Seq Scan on t1  (cost=0.00..145.20 rows=5120 width=18) (actual time=0.010..1.056 rows=5120 loops=1)
 Planning time: 0.033 ms
 Execution time: 1.380 ms
(3 rows)

postgres=# 

Now, why I doubt that “15% performance gap” is realistic.

The main problem is MVCC implementation in PostgreSQL – every time when you update row PostgreSQL creates new version of data and stores it in another place – note that ctid gets changed upon every update:

postgres=# create table t1(a1 int);
CREATE TABLE
postgres=# insert into t1 values(1);
INSERT 0 1
postgres=# begin;
BEGIN
postgres=# select ctid, a1 from t1;
 ctid  | a1 
-------+----
 (0,1) |  1
(1 row)

postgres=# update t1 set a1=2;
UPDATE 1
postgres=# select ctid, a1 from t1;
 ctid  | a1 
-------+----
 (0,2) |  2
(1 row)

postgres=# update t1 set a1=3;
UPDATE 1
postgres=# select ctid, a1 from t1;
 ctid  | a1 
-------+----
 (0,3) |  3
(1 row)

and such behaviour leads to the following conclusions:

  • wide tables do not like updates – it is required to copy a lot of data
  • every update affects all table indexes

My experience (comparison with Oracle, single user) is following:

  • inserts (creation of objects) are 15% slower
  • updates are 50% slower

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s