My task is to create a script to load a bunch of files into a DB model. The files are CSV, with quoted strings and headers.
Postgres has a COPY command which I find fabulous for loading data. unfortunately, the psycopg implementations copy_from() , copy_to() and copy_expert() have very few examples and I found their usage a bit of a challenge.
Initially, I thought of opening the file, parsing it myself and perform a copy_from() like in the other script dealing with XML (aka lots of work).
copy_expert() requires a SQL statement and a file-like object. Therefore, first step is to prepare the SQL statement for my file layout:
SQL_STATEMENT = """ COPY %s FROM STDIN WITH CSV HEADER DELIMITER AS ',' """
This statement is a “classic” COPY FROM statement, specifying the input is a CSV file, has a header row, the delimiter is a comma. The ”%s” I use to specify the destination table as a parameter.
The second step is to open the file:
my_file = open("csv_file")
I created a function to perform the load:
def process_file(conn, table_name, file_object): cursor = conn.cursor() cursor.copy_expert(sql=SQL_STATEMENT % table_name, file=file_object) conn.commit() cursor.close()
I can use it like so:
connection = psycopg2.connect(...) try: process_file(connection, 'my_table', my_file) finally: connection.close()
Note: Please verify the
SQL_STATEMENT matches your format (e.g. right delimiter, right quote…). Also, make sure the first row of the file matches the column names in the table (for the columns that are present).