Use XPath in PL/PGSQL for Batch Processing

A while ago I had to process XML loaded in a PostgreSQL database. The task was to extract references from a publication structure. Fortunately, PostgreSQL allows you to use XPath in plpgsql, but there are some things you need to be aware. First, you need to note all namespaces:

Then, you can use the […]

Migrate PostgreSQL Between Two Machines

TL;DR: Once you prepare the connection, pg_dump is your friend. Recently, we had to migrate a large-ish DB (1.5 TB) from a Windows installation to a Linux one in order to accommodate a series of third party extensions. We have split the process in three: Prepare the migration Migrate users Migrate the database Wrap up […]

Add XML to PostgreSQL from Python

One of the projects I worked on was to import a large number of XML files in a PostgreSQL database (as XML files). I chose python to do it. Here are the steps: Database The data source is the SCOPUS database which has a silly number of entries (approx 20,000,000), most containing 2 XML files […]

Add Primary Key Column to PostgreSQL Table

The other day I got a question: How to add a numeric PK column to an already existent table? A quick look on the net showed that it is indeed possible to add an auto-incrementing PK to an already existing table (if you don’t have already a PK that is!). The code is quite simple […]

Fuzzy String Matching in PostgreSQL

A database I am working on has a lot of records (40M+) and the guys mining it are currently using python to process it. This is very powerful (nltk) but also quite slow. One of the guys has found that you can do fuzzy string matching straight in Postgres and the feature is quite advanced! […]

Postgres: Show Schema Sizes in a Database

While loading a large database from files to a Postgres schema, I found very useful to know how much space on disk it actually occupies (just in case I need to re-size the disk). I found this answer online:

Which results in a list like: HTH, PS: All credit goes to this StackOverflow answer! […]

Load a CSV File with Header in Postgres via Psycopg

My task is to create a script to load a bunch of files into a DB model. The files are CSV, with quoted strings and headers. Postgres has a COPY command which I find fabulous for loading data. unfortunately, the psycopg implementations copy_from() , copy_to() and copy_expert() have very few examples and I found their […]

Insert in PostgreSQL in (Almost) Constant Time

One of the things I did was to write a script to insert records in a PostgreSQL database. The records have with a quite simple structure: id – char(20) content – XML field aside – XML field I was working with a sample of 10,000 entries out of 40 million and the script went through […]

PostgreSQL: Call a Stored Procedure for Every Row

I guess that newbies in PostgreSQL like me have first question when they create a store procedure: How do I call it? The answer is simple; you just execute

. Then, usually immediately the second question pops up: I wrote a procedure which I need to call for each row in a table. How do […]

Scroll to top