Load a CSV File with Header in Postgres via Psycopg

My task is to create a script to load a bunch of files into a DB model. The files are CSV, with quoted strings and headers.

Postgres has a COPY command which I find fabulous for loading data. unfortunately, the psycopg implementations copy_from() , copy_to() and copy_expert() have very few examples and I found their usage a bit of a challenge.

Solution

Initially, I thought of opening the file, parsing it myself and perform a copy_from() like in the other script dealing with XML (aka lots of work).

I had a glimmer of hope when I found an answer suggesting the use of copy_expert(sql, file) in psycopg. I’ve been then able to tweak the example to suit my purpose.

copy_expert() requires a SQL statement and a file-like object. Therefore, first step is to prepare the SQL statement for my file layout:

This statement is a “classic” COPY FROM statement, specifying the input is a CSV file, has a header row, the delimiter is a comma. The “%s” I use to specify the destination table as a parameter.

The second step is to open the file:

I created a function to perform the load:

I can use it like so:

Simple.

Note: Please verify the SQL_STATEMENT matches your format (e.g. right delimiter, right quote…). Also, make sure the first row of the file matches the column names in the table (for the columns that are present).


A little experiment: If you find this post and ad below useful, please check the ad out :-)




11 thoughts on “Load a CSV File with Header in Postgres via Psycopg

  1. Thanks a lot for this example! I have been searching quite a while on how to use the psycopg2 implementations and could now finally finish my import script.

    Cheers

  2. Thank you very much! I was having trouble with copying CSV files with header. And didn’t want to handle the header separately inside Python. This works great!

  3. Hi. I decided to give this a try with a bulk import of csv files but i get and error. Any help would be appreciated. Thanks.

    I am receiving this error:

    1. btw Im not sure how I got the second SQL_STATEMENT in there, but it makes no difference on the outcome (that I know of).

    2. First thing that comes to mind is the “\\” on the path stuff (path = ‘D:\directory\\path\\’ to path = ‘D:\\directory\\path\\’)

  4. Hi,
    My CSV doesn’t respect the order of columns as per the tables and when I try below code it fails.

    Is it possible to map the columns names of tables using header row of csv before starting the insert?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to top