Use XPath in PL/PGSQL for Batch Processing

A while ago I had to process XML loaded in a PostgreSQL database. The task was to extract references from a publication structure. Fortunately, PostgreSQL allows you to use XPath in plpgsql, but there are some things you need to be aware. First, you need to note all namespaces:

Then, you can use the […]

Add XML to PostgreSQL from Python

One of the projects I worked on was to import a large number of XML files in a PostgreSQL database (as XML files). I chose python to do it. Here are the steps: Database The data source is the SCOPUS database which has a silly number of entries (approx 20,000,000), most containing 2 XML files […]

Loading XSDs into Oracle through Python

Following the previous incursions in Oracle and their own documentation, we decided it was useful to attach schemas to the inserted data, just because only XML with schemas attached can be indexed. When you have some million entries, indexing seems like a good idea (once you decide what to index). In Oracle, you have DBMS_XMLSCHEMA.registerSchema() […]

Memory dump: Work with XMLField in Oracle (part 2)

TL;DR: Here I present a SQL example from creating the table to performing a SELECT. Table of Contents Part 1 – Prerequisites Part 2 – Create a table and perform a SELECT Part 3 – User Python to insert data Part 4 – references Table and XMLTYPE Once we’ve seen that we have XMLTYPE available, […]

Install lxml on windows (on a virtualenv)

Lxml is a nice python library for XML processing. ETree is really quick, which makes things interesting if you have a large amount of XML files (or a bigger one) to process. Installation on linux/mac is painless (OK, you need homebrew on mac to make int painless, but you get my point…). The other day […]

Scroll to top