Introduction to XML



I suggest to use Eclipse as our IDE during lab classes. It handles XML, DTD, XML Schema, XSLT, and XPath. It will also enable us to run and write Java programs. It is enough for most of our classes.

Of course, it is possible to use other tools if you prefer, but we won't be able to give an extra support.

The best version to use is Eclipse for Java EE Developers, because it will handle XML without any additional plugins.

The most reliable way to use a proper installation of Eclipse is to download and unpack it in your personal account. Yes, it takes some MBs...:(

Other tools

We will also use

  • web browser (I prefer Firefox); sometimes it may be reasonable to compare results in different browsers,
  • Linux command line tools xmllint and xsltproc (it is possible to work without them).
  • Some other libraries and tools will appear when they are needed.

Some instructions, e.g. installation procedures, will be given for Linux OS. Usually it will be possible to work in Windows environment, if you prefer, but in such cases you will have to adapt the scenarios accordingly. The same if you have your personal Mac.

References for subject 1


History and overview:

XML syntax

We'll try not to repeat too much, see lecture slides for an introduction if you were not present. You may also look at the old scenario (in Polish, no longer maintained).

The following examples contain well-formed XML documents.

Example 1.

<?xml version="1.0" encoding="UTF-8" ?>
<!-- Example from Wikipedia -->
<?xml version="1.0" encoding="utf-8"?>
  The same text fragment written in 3 ways:
  <option>x > 0 &amp; x &lt; 100</option>
  <option>x > 0 &#38; x &#60; 100</option>
  <option><![CDATA[x > 0 & x < 100]]></option>
<?xml version="1.0" encoding="iso-8859-2"?>
<!DOCTYPE main_element [
  <!ENTITY entity1 "This is the content of a simple entity">
  <!ENTITY entity2 "<element>This is the content of a <subelement>complex</subelement> one</element>">
] >
<!-- Here (before the main element) a processing instruction or a comment may occur, but not an element nor any text other than whitespace. -->
  <?instruction attribute="value" but you can also like this?>
  <subelement attribute='Attribute value' inny-atrybut="Referencje do encji prostych: &entity1; &quot;">
    Text content <elem>mixed model</elem> żółty żółw.
    <!-- Comments and PIs allowed -->
    <empty_element may_have="attributes"/>
  Zawartość tekstowa &entity1; &#502;
  <![CDATA[x < 5 && x > -5]]>
<!-- Here also a PI or a comment may occur, but not an element nor text. -->

Task 1.

Create files with .xml extension in your workspace (not necessarily in Eclipse), copy the contents of the examples and open the files in your web browser.

Task 2.

Correct syntax errors in this document dok1.xml. Use a web browser or xmllint program to check the file.

Eclipse would also find the errors, but this would be to easy...

In the following, add the fragments to XML documents you have already created.

Task 3.

Write down in different ways the character sequence ]]> in a document body.

Task 4.

Write as an attribute value the expression "x > -5" & 'x < 5'.

Defining entities

Example 2. Entities

Download and unpack Import it as a project into your Eclipse workspace, although Eclipse is not required in this part of lecture.

Folder entities contains examples of entity definition and usage. Use this command to print a document with all entities resolved:

xmllint -loaddtd -noent file1.xml

Run the above xmllint command for documents file1.xml, file2.xml, and file3.xml.

Task 5.

Try to do the following things and verify the file with xmllint. Are they correct?

  1. Refer to an external entity from an attribute.
  2. Refer to a complex entity from an attribute.
  3. Refer to an internal entity from another internal entity definition. Does the order make a difference?
  4. Refer to an entity from an external entity content.
  5. Prepare two entities so that an element is started in one entity and ended in the other. The entities are merged together in a document.

Document structure

Designing the structure of XML documents is an important and non-trivial activity, comparable to domain analysis (something which usually results in a UML class diagram).

In a typical application of XML it is preferred to use semantic (aka descriptive) markup rather than presentational one. Tags should denote logical tree-like structure of a document, names should be descriptive but not too long ;) and the same names should be used for elements having the same role.

Sometimes it is good to start modelling a new XML application with a concrete example. Let's do it having the above hints in mind.

Task 6. Visit card

Create an XML document – your visit card.

  1. Establish the document structure and tag names by yourself.
  2. Check the syntax correctness using a web browser.
  3. Use some non-ASCII characters and declare the valid encoding.

Instead of a visit card, you can prepare an example document (or a fragment) related to your assesment project.


The mechanism of namespaces is presented in the following examples.

Example 3. XML namepspaces in various ways

The example from lecture without any namespace.

<?xml version="1.0"?>
<article code="A1250" xmlns:pre="">
  <title>Assignment in Pascal and C</title>
    <fname>Jan</fname> <surname>Mądralski</surname>
    <address xmlns:pre="urn:addresses">...
    <paragraph xmlns:pre="">
      Assignment is written as <code>x = 5</code> in C
      and <code>x := 5</code> in Pascal.

Canonical use of namespaces with prefixes. All prefixes declared in the root element and used consequently throughout the document.

<?xml version="1.0"?>
<art:article code="A1250"
  <art:title>Assignment in Pascal and C</art:title>
    <fname>Jan</fname> <surname>Mądralski</surname>
      Assignment is written as <t:code>x = 5</t:code> in C
      and <t:code>x := 5</t:code> in Pascal.

Overriding of prefixes in parts of the document tree. BTW, this artificial example shows a bad practice: the same prefix pre is used for different namespaces.

<?xml version="1.0"?>
<pre:article code="A1250" xmlns:pre="">
  <pre:title>Assignment in Pascal and C</pre:title>
    <fname>Jan</fname> <surname>Mądralski</surname>
    <pre:address xmlns:pre="urn:addresses">...
    <pre:paragraph xmlns:pre="">
      Assignment is written as <pre:code>x = 5</pre:code> in C
      and <pre:code>x := 5</pre:code> in Pascal.

Default namespace is leveraged in the following:

<?xml version="1.0"?>
<article code="A1250" xmlns="">
  <title>Assignment in Pascal and C</title>
    <fname>Jan</fname> <surname>Mądralski</surname>
    <address xmlns:pre="urn:addresses">...
    <paragraph xmlns:pre="">
      Assignment is written as <code>x = 5</code> in C
      and <code>x := 5</code> in Pascal.

Task 7.

Open document dok2.xml in a browser. Correct namespace-related errors.

Task 8.

Run program PrintAllTags (provided in project mentioned already) passing different files as arguments.

Experiment with namespaceAware setting and namespace declarations in documents to see how namespace declarations are interpreted by the parser.

Try to parse dok2.xml with different settings of namespace awarness.

Task 9.

  1. Place all elements from your visit-card in a new namespace.
  2. Create a document called rolodex (a set of visit cards); or if you prefer visit-cards-set, the root element of which belongs to a separate namespace. The rolodex should contain one or more visit-cards in their namespace (summarising: two namespaces should be used in a single document).

If you chose to work on a different example than visit cards, now simply try to use more than one namespace in your document.

Valid XHTML 1.1Valid CSS