Loading XSDs into Oracle through Python
Following the previous incursions in Oracle and their own documentation, we decided it was useful to attach schemas to the inserted data, just because only XML with schemas attached can be indexed. When you have some million entries, indexing seems like a good idea (once you decide what to index).
In Oracle, you have DBMS_XMLSCHEMA.registerSchema() to do just that. It has a bunch of parameters which are thoroughly described in their documentation. Suffices to say that you can choose to generate tables, objects or java beans off the loaded schemas. An example of call can be:
dbms_xmlschema.registerSchema(
schemaURL => 'mySchema.xsd',
schemaDoc => '<schema document as a string>',
local => true,
genTypes => false,
genTables => true,
enableHierarchy => dbms_xmlschema.ENABLE_HIERARCHY_NONE
);
In this case, we generate the associated tables but we don't generate custom types.
A Concrete Example
As mentioned elsewhere, XSDs can have dependencies and Oracle supports this (at some extent). My list of schemas in dependency order (i.e. the basic ones are at the beginning) is defined by:
SCHEMA_ORDER = [
'xocs-ani512-xml.xsd',
'xocs-ani512-ani-ce.xsd',
'xocs-ani512-ce.xsd',
'xocs-ani512-cto.xsd',
'xocs-ani512-ait.xsd',
'xocs-ani512.xsd',
]
When actually writing the loader code, I've decided that using PL/SQL throughout the DB-related code would be preferred. This is also quite feasible because I'm not performing selects to require sequential access to data. To this effect, my insertion string looks like:
INSERT_STR_MULTI = """
declare
vSCHEMA clob;
begin
%(segments)s
dbms_xmlschema.registerSchema(
schemaURL => '%(name)s',
schemaDoc => vSCHEMA,
local => true,
genTypes => false,
genTables => true,
enableHierarchy => dbms_xmlschema.ENABLE_HIERARCHY_NONE
);
end;
"""
This string is using the approach described in this previous article.
The whole process of loading them is:
for xsd_file in SCHEMA_ORDER:
xsd_path = os.path.join('/my/xsd/dir', xsd_file)
# Add the schema
#
try:
print "Insert [%s] schema" % xsd_file
xsd_content = open(xsd_path).read()
if len(xsd_content) > 3000:
print "... Large XML file. Splitting"
result = ""
while len(xsd_content) > 3000:
result += "vSCHEMA := CONCAT(vSCHEMA, '%s');\n" % xsd_content[:3000]
xsd_content = xsd_content[3000:]
result += "vSCHEMA := CONCAT(vSCHEMA, '%s');\n" % xsd_content
insert_str = INSERT_STR_MULTI % \
{
'name': xsd_file,
'segments': result,
}
else:
insert_str = INSERT_STR % \
{
'name': xsd_file,
'content': xsd_content,
}
cursor.execute(insert_str)
connection.commit()
except Exception as e:
print "Fail to add [%s]" % xsd_path, e
Several explanations:
- The loop iterates through the list of schemas and loads them sequentially
- The try ... catch code is designed to show me if any XSD fails (e.g. missing dependency) without taking the whole program down.
- The inner code contains the logic for splitting large schemas
References
- How to import schema?
- ORA-01704: string literal too long
- CreateSchemaBasedXML using multiple schemas
- CONCAT
- Using XQuery with Oracle XML DB
- REGISTERSCHEMA Procedures
- Loading and Retrieving Large Documents with Collections
- XML Schema Definition
- XML Schema Primer
- Structured Mapping of XMLType
- Using Oracle XML DB and XML Schema
- XML Schema: Understanding Structures
- Shred XML data
- Return variable from cx_Oracle PL/SQL call in Python
- Oracle PLSQL CreateSchemaBasedXML using multiple schemas
- Tool that can combine many XSD files into one?
- Oracle 11g XMLType object relational Storage -> XML Schema validation
HTH,