What is XML?

The following points can explain the purpose of XML.

  • XML stands for eXtensible Markup Language
  • XML is a markup language much like HTML.
  • XML was designed to describe data.
  • XML tags are not predefined in XML. You must define your own tags.
  • XML uses a DTD (Document Type Definition) to describe the data.
  • XML with a DTD is designed to be self describing.

It is important to understand that XML is not a replacement for HTML. The main purpose of HTML is the Format the Data that is presented through Browser. The purpose of XML is not to Format the Data to be displayed. It's mostly used to store and transfer data and to describe the data. It is device or language independent and can be used for Transmitting Data to any device. The Parser (Or the Program which is capable of understanding the Tags and returning the Text in a Valid Format) on the corresponding device will help in displaying the data in required format.

You can define your own tags in XML file. The way these tags will be interpreted will depend on the program which is going to get this XML file. The data embedded within these tags will be used according to logic implemented in the secondary program which is going to get this XML as feed. This point will be clearer when we start explaining you about how to use the Parsers in next few paragraphs.

XML Declarations

Most of the XML tags have a name associated with it. Here we explain different terms used to indicate the Elements defined in the XML file.

Well Formed Tags:

One of the most important features of a XML file is it should be a Well Formed File. What it means is all the tags should have a closing tag. In a HTML file, for some tags like <br> we don't have to specify a closing tag called </br>. Where as in a XML file, it is compulsory to have a closing tag. So we have to declare <br></br> or <br/>. This are what called as Well Formed Tags.

Elements and Attributes:

Each tag in a XML file can have elements and attributes. Here's how a typical tag looks like.

<Email
to="
admin@mydomain.com"
from=
"user@mySite.com"
subject="Introducing XML">

</Email>

In this example, Email is called as Element. This element called Email has three attributes, to, from and subject.

The Following Rules need to be followed while declaring the XML Elements Names:

  • Names can contain letters, numbers, and other characters
  • Names must not start with a number or "_" (underscore)
  • Names must not start with the letters xml (or XML or Xml ..)
  • Names can not contain spaces

Any name can be used, no words are reserved, but the idea is to make names descriptive. Names with an underscore separator are nice.

Examples: <author_name>, <published_date>.

Avoid "-" and "." in names. It could be a mess if your software tried to subtract name from first (author-name) or think that "name" is a property of the object "author" (author.name).

Element names can be as long as you like, but don't exaggerate. Names should be short and simple, like this: <author_name> not like this <name_of_the_author> .

XML documents often have a parallel database, where fieldnames parallel with element names. A good rule is to use the naming rules of your databases.

Non-English letters like éòá are perfectly legal in XML element names, but watch out for problems if your software vendor doesn't support it.

The ":" should not be used in element names because it is reserved to be used for something called namespaces.

Empty Tags:

In cases where you don't have to provide any sub tags, you can close the tag, by providing a "/" to the Closing Tag. For example declaring

<Text></Text> is same a declaring <Text/>

Comments in XML File:

Comments in XML file are declared the same way as Comments in HTML File.

<Text>Welcome To XML Tutorial</Text>
<!-- This is a comment -->
<Subject/>

The XML Prolog

XML file always starts with a prolog. The minimal prolog contains a declaration that identifies the document as an XML document, like this:

<?xml version="1.0"?>

The declaration may also contain additional information, like this:

<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
  

The XML declaration may contain the following attributes:

version

Identifies the version of the XML markup language used in the data. This attribute is not optional.

encoding

Identifies the character set used to encode the data. "ISO-8859-1" is "Latin-1" the Western European and English language character set. (The default is compressed Unicode: UTF-8.).

standalone

Tells whether or not this document references an external entity or an external data type specification (see below). If there are no external references, then "yes" is appropriate.


Sty - Knowledge is Free