TL;DR: Here I present a SQL example from creating the table to performing a SELECT.
Table of Contents
- Part 1 - Prerequisites
- Part 2 - Create a table and perform a SELECT
- Part 3 - User Python to insert data
- Part 4 - references
Table and XMLTYPE
Once we've seen that we have XMLTYPE available, we can proceed to create a simple table.
The table structure
My problem required for a table with the following structure:
- An identifier (key) - 20 character long string
- An article field (nullable) - XML content
- A field with citations of the article (nullable) - XML content
The SQL statement to create the table is as follows:
CREATE TABLE ARTICLES_XML_DATA ( id VARCHAR(20) NOT NULL PRIMARY KEY, article XMLTYPE, citedby XMLTYPE );
This is straight-forward. However, as you can see, the table is quite dumb and it doesn't have any XSDs attached to its contents.
Adding a record
Adding a record can be done via a simple SQL insert. However you need to make sure you escape all quotes... A simple example which adds a record with the ID and CITEDBY fields is:
insert into ARTICLES_XML_DATA(ID, CITEDBY) values ( '00000000000000000003', '<cited-by> <count>2</count> <citing-doc> <eid>00000000000000000002</eid> </citing-doc> <citing-doc> <eid>00000000000000000001</eid> </citing-doc> </cited-by>' );
An XML structure
Once we have added several records, we can perform queries on them. As you can see from the example above, the CITEDBY field has a very simple structure:
- A count of citations and
- A list of document IDs that refer the current document
The XML from the example above is:
<cited-by> <count>2</count> <citing-doc> <eid>00000000000000000002</eid> </citing-doc> <citing-doc> <eid>00000000000000000001</eid> </citing-doc> </cited-by>
We can therefore perform some interesting queries. For example, we can display a table containing the article ID and the number of citations for that article. The query would be:
SELECT articles.ID, citations.NUMBER FROM ARTICLES_XML_DATA articles, XMLTABLE( 'cited-by/count' PASSING articles.CITEDBY COLUMNS NUMBER VARCHAR2(30) PATH '.' ) citations;
This requires some explanations:
- I want to query the ARTICLESXML_DATA with an alias of _articles. I will need the field ID from it.
- For every record, I want to PASS articles.CITEDBY,
- Get the content from XPATH of cited-by/count,
- Place it in a VARCHAR2(30) as a column named NUMBER and
- Refer to this result (table with a single column) as citations.
As you can see, the result looks very muck like a select on an inner join.
The result is something like:
ID NUMBER -------------------- ------ 00000000000000000001 2