A database optimisation story

In the days of yore (November 2024), before there was a ‘vibe’ before ‘code’, I was to optimise a whole suite of queries in PostgreSQL.

We had ten or so queries to generate a set of reporting tables for analytics once a day—a denormalised view of our transactional tables. These queries were huge and slow.

I did what any well-bred engineer must do. I took the slowest query and:

EXPLAIN ANALYZE <my slow query>

After waiting a cool 13 minutes, I was met with a rather friendly output that approximately looked like this.

Rather Friendly Output

                                                                                                                                                            QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=147483011.14..167439265.44 rows=164294334 width=1144) (actual time=722391.777..810633.458 rows=30302109 loops=1)
   Buffers: shared hit=1741739 read=6483035, temp read=8284857 written=8284920
   InitPlan 1 (returns $0)
     ->  Result  (cost=0.00..0.01 rows=1 width=32) (actual time=0.005..0.006 rows=1 loops=1)
   InitPlan 2 (returns $1)
     ->  Result  (cost=0.00..0.01 rows=1 width=32) (actual time=0.001..0.001 rows=1 loops=1)
   InitPlan 3 (returns $2)
     ->  Result  (cost=0.00..0.01 rows=1 width=32) (actual time=0.000..0.001 rows=1 loops=1)
   InitPlan 4 (returns $3)
     ->  Result  (cost=0.00..0.01 rows=1 width=32) (actual time=0.001..0.001 rows=1 loops=1)
   InitPlan 5 (returns $4)
     ->  Result  (cost=0.00..0.01 rows=1 width=32) (actual time=0.000..0.001 rows=1 loops=1)
   InitPlan 6 (returns $5)
     ->  Result  (cost=0.00..0.01 rows=1 width=32) (actual time=0.000..0.000 rows=1 loops=1)
   InitPlan 7 (returns $6)
     ->  Result  (cost=0.00..0.01 rows=1 width=32) (actual time=0.000..0.001 rows=1 loops=1)
   ->  Gather Merge  (cost=147483011.06..166617793.68 rows=164294334 width=1144) (actual time=722391.776..802478.439 rows=30302466 loops=1)
         Workers Planned: 2
         Params Evaluated: $0, $1, $2, $3, $4, $5, $6
         Workers Launched: 2
         Buffers: shared hit=1741739 read=6483035, temp read=8284857 written=8284920
         ->  Sort  (cost=147482011.03..147653150.96 rows=68455972 width=1144) (actual time=715857.887..740498.116 rows=10100822 loops=3)
               Sort Key: p.id, (date(generate_series.generate_series))
               Sort Method: external merge  Disk: 8018120kB
               Buffers: shared hit=1741739 read=6483035, temp read=8284857 written=8284920
               Worker 0:  Sort Method: external merge  Disk: 7355864kB
               Worker 1:  Sort Method: external merge  Disk: 8640448kB
               ->  Hash Join  (cost=98111314.34..118564254.21 rows=68455972 width=1144) (actual time=492147.822..658887.066 rows=10100822 loops=3)
                     Hash Cond: (p.assigned_facility_id = facilities_1.id)
                     Buffers: shared hit=1741715 read=6483035, temp read=5283053 written=5283097
                     ->  Hash Join  (cost=98109366.95..104872071.62 rows=91127708 width=771) (actual time=492119.505..622548.440 rows=10100822 loops=3)
                           Hash Cond: (p.registration_facility_id = facilities.id)
                           Buffers: shared hit=1725066 read=6483035, temp read=5283053 written=5283097
                           ->  Parallel Hash Left Join  (cost=98107419.55..103503670.57 rows=121380417 width=567) (actual time=492093.262..616028.266 rows=10100822 loops=3)
                                 Hash Cond: (p.id = mh.patient_id)
                                 Buffers: shared hit=1708494 read=6483035, temp read=5283053 written=5283097
                                 ->  Parallel Hash Left Join  (cost=98080217.47..102515540.51 rows=121380417 width=546) (actual time=491136.330..609022.395 rows=10100703 loops=3)
                                       Hash Cond: (((date(generate_series.generate_series)) = current_meds.month_date) AND (p.id = current_meds.patient_id))
                                       Buffers: shared hit=1708487 read=6464916, temp read=5283053 written=5283097
                                       ->  Merge Left Join  (cost=97339331.57..101137407.41 rows=121380417 width=490) (actual time=478426.540..586040.062 rows=10100703 loops=3)
                                             Merge Cond: ((p.id = visits.patient_id) AND ((date_part('month'::text, (date(generate_series.generate_series))::timestamp without time zone)) = visits.month) AND ((date_part('year'::text, (date(generate_series.generate_series))::timestamp without time zone)) = visits.year))
                                             Buffers: shared hit=1708456 read=6032108, temp read=5283053 written=5283097
                                             ->  Merge Left Join  (cost=90908457.78..93415961.47 rows=121380417 width=380) (actual time=328651.162..381108.622 rows=10100703 loops=3)
                                                   Merge Cond: ((p.id = bss.patient_id) AND ((date_part('month'::text, (date(generate_series.generate_series))::timestamp without time zone)) = bss.month) AND ((date_part('year'::text, (date(generate_series.generate_series))::timestamp without time zone)) = bss.year))
                                                   Buffers: shared hit=877268 read=3615121, temp read=3737608 written=3737637
                                                   ->  Merge Left Join  (cost=87713534.81..89105634.05 rows=121380417 width=297) (actual time=260760.226..283927.994 rows=10100703 loops=3)
                                                         Merge Cond: ((p.id = bps.patient_id) AND ((date_part('month'::text, (date(generate_series.generate_series))::timestamp without time zone)) = bps.month) AND ((date_part('year'::text, (date(generate_series.generate_series))::timestamp without time zone)) = bps.year))
                                                         Buffers: shared hit=637268 read=2409751, temp read=2946655 written=2946675
                                                         ->  Sort  (cost=84884016.92..85187467.96 rows=121380417 width=233) (actual time=223534.131..229667.871 rows=10100703 loops=3)
                                                               Sort Key: p.id, (date_part('month'::text, (date(generate_series.generate_series))::timestamp without time zone)), (date_part('year'::text, (date(generate_series.generate_series))::timestamp without time zone))
                                                               Sort Method: external merge  Disk: 2017832kB
                                                               Buffers: shared hit=12271 read=1324196, temp read=2393566 written=2393583
                                                               Worker 0:  Sort Method: external merge  Disk: 1852792kB
                                                               Worker 1:  Sort Method: external merge  Disk: 2166392kB
                                                               ->  Merge Left Join  (cost=58777872.69..60566753.73 rows=121380417 width=233) (actual time=170776.950..198032.261 rows=10100703 loops=3)
                                                                     Merge Cond: ((p.id = past_meds.patient_id) AND ((date(generate_series.generate_series)) = ((past_meds.month_date + '1 mon'::interval))))
                                                                     Buffers: shared hit=12256 read=1324196, temp read=1638939 written=1638952
                                                                     ->  Sort  (cost=53609605.02..53913056.06 rows=121380417 width=177) (actual time=119032.085..122431.595 rows=10100703 loops=3)
                                                                           Sort Key: p.id, (date(generate_series.generate_series))
                                                                           Sort Method: external merge  Disk: 1469424kB
                                                                           Buffers: shared hit=10003 read=28001, temp read=549504 written=549508
                                                                           Worker 0:  Sort Method: external merge  Disk: 1348976kB
                                                                           Worker 1:  Sort Method: external merge  Disk: 1577632kB
                                                                           ->  Nested Loop Left Join  (cost=0.43..30993327.29 rows=121380417 width=177) (actual time=1.080..97127.461 rows=10100703 loops=3)
                                                                                 Join Filter: (to_char(timezone($6, ti
                                                                                 Rows Removed by Join Filter: 14084142
                                                                                 Buffers: shared hit=10003 read=28001
                                                                                 ->  Parallel Index Scan using index_patients_on_registration_facility_id on patients p  (cost=0.42..37700.88 rows=364141 width=85) (actual time=0.740..947.656 rows=291384 loops=3)
                                                                                       Filter: (deleted_at IS NULL)
                                                                                       Rows Removed by Filter: 3
                                                                                       Buffers: shared hit=10003 read=28001
                                                                                 ->  Function Scan on generate_series  (cost=0.01..50.01 rows=1000 width=92) (actual time=0.003..0.210 rows=83 loops=874151)
                                                                     ->  Materialize  (cost=5168267.67..5316144.58 rows=29575382 width=76) (actual time=51744.779..64303.050 rows=30296480 loops=3)
                                                                           Buffers: shared hit=2253 read=1296195, temp read=1089435 written=1089444
                                                                           ->  Sort  (cost=5168267.67..5242206.12 rows=29575382 width=76) (actual time=51744.772..59383.676 rows=30296480 loops=3)
                                                                                 Sort Key: past_meds.patient_id, ((past_meds.month_date + '1 mon'::interval))
                                                                                 Sort Method: external merge  Disk: 2905160kB
                                                                                 Buffers: shared hit=2253 read=1296195, temp read=1089435 written=1089444
                                                                                 Worker 0:  Sort Method: external merge  Disk: 2905160kB
                                                                                 Worker 1:  Sort Method: external merge  Disk: 2905160kB
                                                                                 ->  Seq Scan on reporting_prescriptions past_meds  (cost=0.00..728562.82 rows=29575382 width=76) (actual time=0.028..18434.616 rows=30296531 loops=3)
                                                                                       Buffers: shared hit=2232 read=1296195
                                                         ->  Materialize  (cost=2829517.89..2900638.36 rows=14224095 width=96) (actual time=37226.040..44417.758 rows=14220140 loops=3)
                                                               Buffers: shared hit=624997 read=1085555, temp read=553089 written=553092
                                                               ->  Sort  (cost=2829517.89..2865078.12 rows=14224095 width=96) (actual time=37226.030..42058.968 rows=14220140 loops=3)
                                                                     Sort Key: bps.patient_id, bps.month, bps.year
                                                                     Sort Method: external merge  Disk: 1474904kB
                                                                     Buffers: shared hit=624997 read=1085555, temp read=553089 written=553092
                                                                     Worker 0:  Sort Method: external merge  Disk: 1474904kB
                                                                     Worker 1:  Sort Method: external merge  Disk: 1474904kB
                                                                     ->  Seq Scan on reporting_patient_blood_pressures bps  (cost=0.00..712424.95 rows=14224095 width=96) (actual time=3.005..20115.049 rows=14220155 loops=3)
                                                                           Buffers: shared hit=624997 read=1085555
                                                   ->  Materialize  (cost=3194922.97..3276716.30 rows=16358665 width=115) (actual time=67890.854..88371.933 rows=16359636 loops=3)
                                                         Buffers: shared hit=240000 read=1205370, temp read=790953 written=790962
                                                         ->  Sort  (cost=3194922.97..3235819.63 rows=16358665 width=115) (actual time=67890.846..85576.596 rows=16359636 loops=3)
                                                               Sort Key: bss.patient_id, bss.month, bss.year
                                                               Sort Method: external merge  Disk: 2109200kB
                                                               Buffers: shared hit=240000 read=1205370, temp read=790953 written=790962
                                                               Worker 0:  Sort Method: external merge  Disk: 2109216kB
                                                               Worker 1:  Sort Method: external merge  Disk: 2109208kB
                                                               ->  Seq Scan on reporting_patient_blood_sugars bss  (cost=0.00..645376.65 rows=16358665 width=115) (actual time=3.289..18358.264 rows=16359684 loops=3)
                                                                     Buffers: shared hit=240000 read=1205370
                                             ->  Materialize  (cost=6430873.80..6582540.11 rows=30333262 width=142) (actual time=149775.245..191096.813 rows=30296481 loops=3)
                                                   Buffers: shared hit=831188 read=2416987, temp read=1545445 written=1545460
                                                   ->  Sort  (cost=6430873.80..6506706.95 rows=30333262 width=142) (actual time=149775.240..185877.227 rows=30296481 loops=3)
                                                         Sort Key: visits.patient_id, visits.month, visits.year
                                                         Sort Method: external merge  Disk: 4121184kB
                                                         Buffers: shared hit=831188 read=2416987, temp read=1545445 written=1545460
                                                         Worker 0:  Sort Method: external merge  Disk: 4121192kB
                                                         Worker 1:  Sort Method: external merge  Disk: 4121184kB
                                                         ->  Seq Scan on reporting_patient_visits visits  (cost=0.00..1386057.62 rows=30333262 width=142) (actual time=0.425..56957.588 rows=30296531 loops=3)
                                                               Buffers: shared hit=831188 read=2416987
                                       ->  Parallel Hash  (cost=556039.76..556039.76 rows=12323076 width=76) (actual time=12653.113..12653.114 rows=10098844 loops=3)
                                             Buckets: 33554432  Batches: 1  Memory Usage: 3582336kB
                                             Buffers: shared hit=1 read=432808
                                             ->  Parallel Seq Scan on reporting_prescriptions current_meds  (cost=0.00..556039.76 rows=12323076 width=76) (actual time=0.827..7166.755 rows=10098844 loops=3)
                                                   Buffers: shared hit=1 read=432808
                                 ->  Parallel Hash  (cost=22159.81..22159.81 rows=403381 width=37) (actual time=955.126..955.127 rows=320948 loops=3)
                                       Buckets: 1048576  Batches: 1  Memory Usage: 80000kB
                                       Buffers: shared hit=7 read=18119
                                       ->  Parallel Seq Scan on medical_histories mh  (cost=0.00..22159.81 rows=403381 width=37) (actual time=0.330..813.405 rows=320948 loops=3)
                                             Filter: (deleted_at IS NULL)
                                             Rows Removed by Filter: 2
                                             Buffers: shared hit=7 read=18119
                           ->  Hash  (cost=1925.82..1925.82 rows=1726 width=236) (actual time=26.228..26.237 rows=2284 loops=3)
                                 Buckets: 4096 (originally 2048)  Batches: 1 (originally 1)  Memory Usage: 588kB
                                 Buffers: shared hit=16572
                                 ->  Merge Join  (cost=1609.98..1925.82 rows=1726 width=236) (actual time=22.896..24.844 rows=2284 loops=3)
                                       Merge Cond: (org_regions.path = (subpath(state_regions.path, 0, '-1'::integer)))
                                       Buffers: shared hit=16572
                                       ->  Index Scan using index_regions_on_unique_path on regions org_regions  (cost=0.41..270.89 rows=3032 width=156) (actual time=0.012..0.551 rows=1543 loops=3)
                                             Buffers: shared hit=3183
                                       ->  Sort  (cost=1609.57..1613.89 rows=1726 width=316) (actual time=22.877..23.036 rows=2284 loops=3)
                                             Sort Key: (subpath(state_regions.path, 0, '-1'::integer))
                                             Sort Method: quicksort  Memory: 1237kB
                                             Buffers: shared hit=13389
                                             Worker 0:  Sort Method: quicksort  Memory: 1237kB
                                             Worker 1:  Sort Method: quicksort  Memory: 1237kB
                                             ->  Merge Join  (cost=1200.92..1516.77 rows=1726 width=316) (actual time=17.976..20.749 rows=2284 loops=3)
                                                   Merge Cond: (state_regions.path = (subpath(district_regions.path, 0, '-1'::integer)))
                                                   Buffers: shared hit=13389
                                                   ->  Index Scan using index_regions_on_unique_path on regions state_regions  (cost=0.41..270.89 rows=3032 width=156) (actual time=0.010..0.849 rows=2505 loops=3)
                                                         Buffers: shared hit=4245
                                                   ->  Sort  (cost=1200.52..1204.83 rows=1726 width=278) (actual time=17.958..18.131 rows=2284 loops=3)
                                                         Sort Key: (subpath(district_regions.path, 0, '-1'::integer))
                                                         Sort Method: quicksort  Memory: 1234kB
                                                         Buffers: shared hit=9144
                                                         Worker 0:  Sort Method: quicksort  Memory: 1234kB
                                                         Worker 1:  Sort Method: quicksort  Memory: 1234kB
                                                         ->  Merge Join  (cost=791.87..1107.72 rows=1726 width=278) (actual time=12.540..15.719 rows=2284 loops=3)
                                                               Merge Cond: (district_regions.path = (subpath(block_regions.path, 0, '-1'::integer)))
                                                               Buffers: shared hit=9144
                                                               ->  Index Scan using index_regions_on_unique_path on regions district_regions  (cost=0.41..270.89 rows=3032 width=156) (actual time=0.009..0.962 rows=2538 loops=3)
                                                                     Buffers: shared hit=4272
                                                               ->  Sort  (cost=791.47..795.78 rows=1726 width=240) (actual time=12.522..12.716 rows=2284 loops=3)
                                                                     Sort Key: (subpath(block_regions.path, 0, '-1'::integer))
                                                                     Sort Method: quicksort  Memory: 1235kB
                                                                     Buffers: shared hit=4872
                                                                     Worker 0:  Sort Method: quicksort  Memory: 1235kB
                                                                     Worker 1:  Sort Method: quicksort  Memory: 1235kB
                                                                     ->  Merge Join  (cost=382.82..698.67 rows=1726 width=240) (actual time=6.956..10.344 rows=2284 loops=3)
                                                                           Merge Cond: (block_regions.path = (subpath(facility_regions.path, 0, '-1'::integer)))
                                                                           Buffers: shared hit=4872
                                                                           ->  Index Scan using index_regions_on_unique_path on regions block_regions  (cost=0.41..270.89 rows=3032 width=156) (actual time=0.012..0.982 rows=2555 loops=3)
                                                                                 Buffers: shared hit=4290
                                                                           ->  Sort  (cost=382.42..386.73 rows=1726 width=202) (actual time=6.933..7.139 rows=2285 loops=3)
                                                                                 Sort Key: (subpath(facility_regions.path, 0, '-1'::integer))
                                                                                 Sort Method: quicksort  Memory: 1252kB
                                                                                 Buffers: shared hit=582
                                                                                 Worker 0:  Sort Method: quicksort  Memory: 1252kB
                                                                                 Worker 1:  Sort Method: quicksort  Memory: 1252kB
                                                                                 ->  Hash Join  (cost=128.73..289.62 rows=1726 width=202) (actual time=1.610..3.597 rows=2299 loops=3)
                                                                                       Hash Cond: (facility_regions.source_id = facilities.id)
                                                                                       Buffers: shared hit=582
                                                                                       ->  Seq Scan on regions facility_regions  (cost=0.00..154.90 rows=2277 width=172) (actual time=0.008..0.883 rows=2299 loops=3)
                                                                                             Filter: ((region_type)::text = 'facility'::text)
                                                                                             Rows Removed by Filter: 711
                                                                                             Buffers: shared hit=351
                                                                                       ->  Hash  (cost=99.99..99.99 rows=2299 width=30) (actual time=1.567..1.568 rows=2299 loops=3)
                                                                                             Buckets: 4096  Batches: 1  Memory Usage: 175kB
                                                                                             Buffers: shared hit=231
                                                                                             ->  Seq Scan on facilities  (cost=0.00..99.99 rows=2299 width=30) (actual time=0.008..0.888 rows=2299 loops=3)
                                                                                                   Buffers: shared hit=231
                     ->  Hash  (cost=1925.82..1925.82 rows=1726 width=236) (actual time=28.099..28.106 rows=2284 loops=3)
                           Buckets: 4096 (originally 2048)  Batches: 1 (originally 1)  Memory Usage: 588kB
                           Buffers: shared hit=16583
                           ->  Merge Join  (cost=1609.98..1925.82 rows=1726 width=236) (actual time=24.097..26.482 rows=2284 loops=3)
                                 Merge Cond: (org_regions_1.path = (subpath(state_regions_1.path, 0, '-1'::integer)))
                                 Buffers: shared hit=16583
                                 ->  Index Scan using index_regions_on_unique_path on regions org_regions_1  (cost=0.41..270.89 rows=3032 width=156) (actual time=0.029..0.671 rows=1543 loops=3)
                                       Buffers: shared hit=3185
                                 ->  Sort  (cost=1609.57..1613.89 rows=1726 width=316) (actual time=24.061..24.250 rows=2284 loops=3)
                                       Sort Key: (subpath(state_regions_1.path, 0, '-1'::integer))
                                       Sort Method: quicksort  Memory: 1237kB
                                       Buffers: shared hit=13398
                                       Worker 0:  Sort Method: quicksort  Memory: 1237kB
                                       Worker 1:  Sort Method: quicksort  Memory: 1237kB
                                       ->  Merge Join  (cost=1200.92..1516.77 rows=1726 width=316) (actual time=18.422..21.656 rows=2284 loops=3)
                                             Merge Cond: (state_regions_1.path = (subpath(district_regions_1.path, 0, '-1'::integer)))
                                             Buffers: shared hit=13389
                                             ->  Index Scan using index_regions_on_unique_path on regions state_regions_1  (cost=0.41..270.89 rows=3032 width=156) (actual time=0.003..0.986 rows=2505 loops=3)
                                                   Buffers: shared hit=4245
                                             ->  Sort  (cost=1200.52..1204.83 rows=1726 width=278) (actual time=18.412..18.605 rows=2284 loops=3)
                                                   Sort Key: (subpath(district_regions_1.path, 0, '-1'::integer))
                                                   Sort Method: quicksort  Memory: 1234kB
                                                   Buffers: shared hit=9144
                                                   Worker 0:  Sort Method: quicksort  Memory: 1234kB
                                                   Worker 1:  Sort Method: quicksort  Memory: 1234kB
                                                   ->  Merge Join  (cost=791.87..1107.72 rows=1726 width=278) (actual time=12.861..16.132 rows=2284 loops=3)
                                                         Merge Cond: (district_regions_1.path = (subpath(block_regions_1.path, 0, '-1'::integer)))
                                                         Buffers: shared hit=9144
                                                         ->  Index Scan using index_regions_on_unique_path on regions district_regions_1  (cost=0.41..270.89 rows=3032 width=156) (actual time=0.007..0.995 rows=2538 loops=3)
                                                               Buffers: shared hit=4272
                                                         ->  Sort  (cost=791.47..795.78 rows=1726 width=240) (actual time=12.845..13.038 rows=2284 loops=3)
                                                               Sort Key: (subpath(block_regions_1.path, 0, '-1'::integer))
                                                               Sort Method: quicksort  Memory: 1235kB
                                                               Buffers: shared hit=4872
                                                               Worker 0:  Sort Method: quicksort  Memory: 1235kB
                                                               Worker 1:  Sort Method: quicksort  Memory: 1235kB
                                                               ->  Merge Join  (cost=382.82..698.67 rows=1726 width=240) (actual time=7.102..10.672 rows=2284 loops=3)
                                                                     Merge Cond: (block_regions_1.path = (subpath(facility_regions_1.path, 0, '-1'::integer)))
                                                                     Buffers: shared hit=4872
                                                                     ->  Index Scan using index_regions_on_unique_path on regions block_regions_1  (cost=0.41..270.89 rows=3032 width=156) (actual time=0.006..1.197 rows=2555 loops=3)
                                                                           Buffers: shared hit=4290
                                                                     ->  Sort  (cost=382.42..386.73 rows=1726 width=202) (actual time=7.084..7.286 rows=2285 loops=3)
                                                                           Sort Key: (subpath(facility_regions_1.path, 0, '-1'::integer))
                                                                           Sort Method: quicksort  Memory: 1252kB
                                                                           Buffers: shared hit=582
                                                                           Worker 0:  Sort Method: quicksort  Memory: 1252kB
                                                                           Worker 1:  Sort Method: quicksort  Memory: 1252kB
                                                                           ->  Hash Join  (cost=128.73..289.62 rows=1726 width=202) (actual time=1.523..3.795 rows=2299 loops=3)
                                                                                 Hash Cond: (facility_regions_1.source_id = facilities_1.id)
                                                                                 Buffers: shared hit=582
                                                                                 ->  Seq Scan on regions facility_regions_1  (cost=0.00..154.90 rows=2277 width=172) (actual time=0.018..1.169 rows=2299 loops=3)
                                                                                       Filter: ((region_type)::text = 'facility'::text)
                                                                                       Rows Removed by Filter: 711
                                                                                       Buffers: shared hit=351
                                                                                 ->  Hash  (cost=99.99..99.99 rows=2299 width=30) (actual time=1.474..1.475 rows=2299 loops=3)
                                                                                       Buckets: 4096  Batches: 1  Memory Usage: 175kB
                                                                                       Buffers: shared hit=231
                                                                                       ->  Seq Scan on facilities facilities_1  (cost=0.00..99.99 rows=2299 width=30) (actual time=0.012..0.898 rows=2299 loops=3)
                                                                                             Buffers: shared hit=231
 Planning:
   Buffers: shared hit=1242 read=34
 Planning Time: 26.536 ms
 Execution Time: 813398.097 ms
(246 rows)

We have the good folk such as Hubert “depesz” Lubaczewski who has blessed us with a rather fine tool for dealing with Rather Friendly Outputs.

explain.depesz.com

My Rather Friendly Output is friendlier still, as I have a big red box honking at me pointing to what has gone wrong. But my well-bred engineer brain thought: Now What?!

Sir depesz in his rather fine tool hinted at a work_mem problem. Ah yes, a knob that I can turn for the duration of these overnight queries. This is a more useful lever for me than rewriting a 200 line SQL query that joins the world.

But how must I think and reason about this? What can my hardware support? I had little experience with PostgreSQL knobs.

A month before the aforementioned day of yore, October 2024, people were saying glowing things about a model now known as Sonnet 3.6, then known as Sonnet 3.5 (new). Simpler times. What matters here is that it was the clear best model when there were clear best AI models.

I did what few well-bred engineers where doing at the time. I shoved my plan and my hardware spec into Sir Sonnet 3.5 (new).

This was my prompt:

This is a PostgreSQL EXPLAIN ANALYZE output

Tell me what configuration parameters I can tune to make this go faster.

<system_info>
RAM = 32 GB
Current parameters:
          synchronous_commit: "on"
          max_connections: "200"
          shared_buffers: "8GB"
          effective_cache_size: "24GB"
          maintenance_work_mem: "2GB"
          checkpoint_completion_target: "0.9"
          wal_buffers: "16MB"
          default_statistics_target: "100"
          random_page_cost: "1.1"
          effective_io_concurrency: "200"
          work_mem: "20971kB"  # NOTE: but for just this query, we temporarily have set it to 1.5GB, which is the most we can push it safely
          min_wal_size: "1GB"
          max_wal_size: "4GB"
          max_worker_processes: "4"
          max_parallel_workers_per_gather: "2"
          max_parallel_workers: "4"
          max_parallel_maintenance_workers: "2"
</system_info>

Sir Sonnet said so:

# PostgreSQL Performance Analysis

## Key Performance Metrics
- Total Execution Time: 813.4 seconds (13.5 minutes)
- Planning Time: 26.5ms
- Total Rows: 30,302,109
- Buffer Usage:
  - Shared Hit: 1,741,739
  - Shared Read: 6,483,035
  - Temp Read: 8,284,857
  - Temp Written: 8,284,920

## Major Performance Issues Identified

1. **Heavy Disk I/O for Sorting**
   - Multiple large external sorts occurring
   - Largest sort using 8018MB of disk space
   - Many temp buffers being read and written

2. **Parallel Query Execution Limitations**
   - Only using 2 parallel workers
   - Several large sequential scans that could benefit from more parallelism

3. **Memory Pressure**
   - Multiple operations spilling to disk despite high work_mem
   - Large hash joins and sorts requiring temporary files

## Recommended Parameter Adjustments

### Critical Changes

1. `max_parallel_workers_per_gather`: 4 (currently 2)
   - Justification: Query shows heavy parallel operations with large sequential scans
   - This change alone could significantly improve the performance of the parallel operations

2. `work_mem`: 2GB (currently 1.5GB)
   - Justification: Multiple sorts spilling to disk
   - With 32GB RAM and considering the parallel workers, we can safely increase this
   - Note: Effective memory per sort will be work_mem × number of parallel workers

3. `max_parallel_workers`: 8 (currently 4)
   - Justification: System has enough RAM to support more parallel workers
   - Will allow better utilization of parallel operations

### Secondary Adjustments

4. `maintenance_work_mem`: 4GB (currently 2GB)
   - Justification: Large sorts occurring in maintenance operations
   - System has enough RAM to support this increase

5. `effective_io_concurrency`: 300 (currently 200)
   - Justification: Heavy I/O operations occurring
   - If using SSDs, this can be increased further

6. `random_page_cost`: 1.0 (currently 1.1)
   - Justification: If using SSDs, this should be lowered to better reflect actual I/O costs

## Monitoring Recommendations

After implementing these changes:
1. Monitor temp file usage (`pg_stat_database`)
2. Watch for memory pressure using `pg_stat_activity`
3. Track parallel execution effectiveness with `pg_stat_statements`

Not all the recommendations made sense, but the critical ones seem good. So I obeyed the AI and reran the query.

And my database crashed.

I committed the original sin. I vibed too close to the sun.

Exasperated, I went back to Sir Sonnet.

I get this issue whenever I try to run a long query in my postgresql console in production:


```
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
SSL SYSCALL error: EOF detected
The connection to the server was lost. Attempting reset: Failed.
!?>
```

Help me fix this

My effective_shared_cache is 24GB.

Setting my work_mem to 2GB just for this one long-running query causes a crash. Why is that?

Sir Sonnet responded,

Ah, this is a crucial detail. The issue is that work_mem is allocated PER OPERATION in a query plan,
not per query or per connection. This is a common misconception that can lead to server crashes.
Let's break this down:

If your query has multiple operations (like sorts, hash joins, etc.), each operation can use up to
the specified work_mem
If your query runs in parallel, each parallel worker gets its own allocation
Multiple concurrent connections each get their own allocations

For example, if you have:

work_mem = 2GB
A query with 4 sort operations
Running with a parallel worker degree of 3
5 concurrent users running similar queries

The potential memory usage could be:
```
2GB * 4 operations * 3 parallel workers * 5 concurrent users = 120GB
```

This is why the server crashes - it's trying to allocate much more memory than available!

At this point, I want you to imagine a few more iterations, a bit or railing against Sir Sonnet’s confident confabulations.

At the end of the process, I had a significantly faster query. I played with a bunch of PostgreSQL knobs and made a matrix.

parameters	gather merge disk IO		query time (ms)
	bytes read	bytes written
(current setting) work_mem = 20971kB, gather merge workers: 2	52GB	52GB	1,622,533
work_mem = 1.5GB, gather merge workers: 2	23GB	23GB	813,398
work_mem = 1GB, gather merge workers: 2	23GB	23GB	874,059
work_mem = 1GB, gather merge workers: 3	23GB	23GB	888,341
work_mem = 1GB, gather merge workers: 3, effective_io_concurrency: 400	23GB	23GB	903,961
work_mem = 1GB, gather merge workers: 2, random_io_cost: 1.1	23GB	23GB	849,473

Is this something a human could most definitely do well? Yes indeed. I exchanged notes with Srihari, and someone with deep experience such as he would have figured this out in an afternoon.

Where Sir Sonnet helped me is that I could, as someone who has never turned a knob on a PostgreSQL database, could be as productive as a Srihari. And my adversarial relationship with Sir Sonnet made me learn about PostgreSQL internals at unprecedented pace.

LLMs are abundantly smarter than the days of yore (November 2024). The models can reason intelligently and use tools better than before. There’s more to do with it than vibe code.