BASIC XML PDF

adminComment(0)

xml version="" encoding="UTF-8"?> Belgian Waffles $ Two of our famous . This tutorial will teach you the basics of XML. The tutorial is divided into sections such as. XML Basics, Advanced XML, and XML tools. Each of these sections. Basic XML Concepts. 3. „XML is the cure for your data exchange, information pdf">.


Basic Xml Pdf

Author:DONYA SHINER
Language:English, Portuguese, French
Country:Venezuela
Genre:Fiction & Literature
Pages:724
Published (Last):17.07.2016
ISBN:491-8-56257-484-5
ePub File Size:24.79 MB
PDF File Size:12.15 MB
Distribution:Free* [*Sign up for free]
Downloads:28692
Uploaded by: LISETTE

Parsing XML. A Basic XML Document. Differences Between XML and HTML. Common Mistakes. White Space. Closing Tags. Nesting Tags. Root Element. read the entire document at adunsexanro.gq on the W3C Web .. There are three basic ways to tell a browser (specifically, Microsoft Internet Ex-. Before you continue you should have a basic understanding of the following: HTML is about displaying information, while XML is about carrying information.

Applications for the Microsoft. Many of these standards are quite complex and it is not uncommon for a specification to comprise several thousand pages. XML is used extensively to underpin various publishing formats.

Disparate systems communicate with each other by exchanging XML messages. This is also referred to as the canonical schema. XML has come into common use for the interchange of data over the Internet. This is not an exhaustive list of all the constructs that appear in XML; it provides an introduction to the key constructs most often encountered in day-to-day use.

Character An XML document is a string of characters. Almost every legal Unicode character may appear in an XML document. Processor and application The processor analyzes the markup and passes structured information to an application.

The specification places requirements on what an XML processor must do and not do, but the application is outside its scope. The processor as the specification calls it is often referred to colloquially as an XML parser. Markup and content The characters making up an XML document are divided into markup and content, which may be distinguished by the application of simple syntactic rules.

Strings of characters that are not markup are content. In addition, whitespace before and after the outermost element is classified as markup. The version attribute is required. The xmlns: In our example, we will use an xsl prefix on all the stylesheet-related tags in our XSL documents to associate them with this namespace.

The next element will be the output element, which is used to define the type of output you want from the XSL file. Now we come to the heart of XSLT — the template and apply-templates elements. Together, these two elements make the transformations happen. Put simply, the XSLT processor for our immediate purposes, the browser starts reading the input document, looking for elements that match any of the template elements in our style sheet.

When one is found, the contents of the corresponding template element tells the processor what to output before continuing its search. Where a template contains an apply-templates element, the XSLT processor will search for XML elements contained within the current element and apply templates associated with them. The first thing we want to do is match the letter element that contains the rest of our document.

This is fairly straightforward:. This very simple batch of XSLT simply states: Were the value simply letter , the template would match letter elements throughout the document. By default, apply-templates will match not only elements, but text and even whitespace between the elements as well. XSLT processors have a set of default, or implicit templates, one of which simply outputs any text or whitespace it encounters.

We do this with another XPath expression: Each of these templates matches one of the elements we expect to find inside the letter element: In each case, we output a text label e. The last thing we have to do in the XSL file is close off the stylesheet element that began the file:. Left this way, the output would look something like this:. Each of our three main templates begins with a line break and then some whitespace before the label, which is being carried through to the output.

But wait — what about the line break and whitespace that ends each template? Well by default, the XSLT standard mandates that whenever there in only whitespace including line breaks between two tags, the whitespace should be ignored.

But when there is text between two tags e. The vast majority of XML books and tutorials out there completely ignore these whitespace treatment issues.

Best to get a good grasp of it now, rather than waiting for insanity to set in when you least expect it. All it does is output the text it contains, even if it is just whitespace. Notice how each template now outputs its label e.

This gives us the fine control over formatting that we need when outputting a plain text file. Are we done yet? Not quite.

When you view the XML document in Firefox, you should see something similar to the result pictured in Figure 2. Internet Explorer interprets the result as HTML code, even when the style sheet clearly specifies that it will output text.

As a result, whitespace is collapsed and our whole document appears on one line. For this reason, it is not yet practical to rely on browser support for XSLT in a real-world website. You should see something similar to Figure 2. What happens if you need to transform your own XML document into an XML document that meets the needs of another organization or person? Not to worry — XSLT will save the day! You see, Web browsers only supply collapsible tree formatting for XML documents without style sheets.

Building XML

XML documents that result from a style sheet transformation are displayed without any styling at all, or at best are treated as HTML — not at all the desired result. There are several things that need to be added to your style sheet to signal to the browser that the document is more than a plain XML file, though. Here we have declared a default namespace for tags without prefixes in the style sheet. Next up, we can flesh out the output element to more fully describe the output document type: In addition to the method and indent attributes, we have specified a number of new attributes here: Internet Explorer for Windows displays XHTML documents in Quirks Mode when this declaration is present, so by omitting it we can ensure that this browser will display it in the more desirable Standards Compliance mode.

The rest of the style sheet is as it was for the HTML output example we saw above. Now, we need to identify exactly what we need for our news items, binary files, and Web copy. We must also manage and track site administrators using XML. Compared to our article content type, news will be fairly straightforward. We will need to track these pieces of information:.

The easiest way to keep track of copy is to treat each piece a little like an article. An XML document that tracks a piece of Web copy will look like this:. We will need to keep track of each administrator on the site, as these are the folks who can log in and make changes to advertisement copy, articles, news pieces, and binary files.

After that, you should have enough of a working knowledge of XML and its wacky family to really start development. In fact, in many contexts, consistency can be a very beautiful thing. Remember that XML allows you to create any kind of language you want. In many cases, as long as you follow the rules of well-formedness, just about anything goes in XML. However, there will come a time when you need your XML document to follow some rules — to pass a validity test — and those times will require that your XML data be consistently formatted.

What we need is a way to enforce that kind of rule. In XML, there are two ways to set up consistency rules: A DTD document type definition is a tried and true if not old-fashioned way of achieving consistency.

Each of these technologies contains lots of hidden nooks and crannies crammed with rules, exceptions, notations, and side stories. Speaking of side stories, did you know that DTD actually stands for two things? It stands not just for document type definition, but also document type declaration.

The declaration consists of the lines of code that make up the definition. Just a warning before we start this chapter: As for the first question, many possible answers spring to mind:. Using a system to ensure consistency allows your XML documents to interact with all kinds of applications, contexts, and business systems — not just your own. The way DTDs work is relatively simple.

A DTD might look something like this:. Those of you who are paying attention should have noticed some remarkable similarities between this DTD and the Letter to Mother example that we worked on in Chapter 2, XML in Practice.

XML basics for new users

In fact, if you look closely, each line of the DTD provides a clue as to how our letter should be structured. This is called an element declaration. You can declare elements in any order you want, but they must all be declared in the DTD.

A DTD element declaration consists of a tag name and a definition in parentheses. These parentheses can contain rules for any of:.

In this case, we want the letter element to contain, in order, the elements to , from , and message. As you can see, the sequence of child elements is comma-delimited.

XML basics for new users

In fact, to be more precise, the sequence not only specifies the order in which the elements should appear, but also, how many of each element should appear. In this case, the element declaration specifies that one of each element must appear in the sequence. If our file contained two from elements, for example, it would be as invalid as if it listed the message element before to.

How will you do that?

With a neat little system of notation, defined in Table 3. After the letter declaration, we see these three declarations: So whenever you see this notation in a DTD, you know that the element must contain only text. This notation allows the paragraph element to contain any combination of plain text and b , i , u , and highpriority elements.

Note that with mixed content like this, you have no control over the number or order of the elements that are used. What about elements such as the hr and br , which in HTML contain no content at all? These are called empty elements, and are declared in a DTD as follows:. Remember attributes? An attribute declaration is structured differently than an element declaration.

For one thing, we define it with! Also, we must include in the declaration the name of the element that contains the attribute s , followed by a list of the attributes and their possible values.

Basically, this attribute can contain any string of characters or numbers.

In DTD-speak, this means that the attribute is optional. Instead of allowing any arbitrary text, however, the DTD limits the values to either male or female. If, in our document, an actor element fails to contain a gender attribute, or contains a gender attribute with values other than male or female , then our document would be deemed invalid.

The actorid attribute has been designated an ID. In DTD-speak, an ID attribute must contain a unique value, which is handy for product codes, database keys, and other identifying factors. In our example, we want the actorid attribute to uniquely identify each actor in the list. The ID type set for the actorid attribute ensures that our XML document is valid if and only if a unique actorid is assigned to each actor.

Incidentally, if you want to declare an attribute that must contain a reference to a unique ID that is assigned to an element somewhere in the document, you can declare it with the IDREF attribute type.

An entity is a piece of XML code that can be used and reused in a document with an entity reference. There are different types of entities, including general, parameter, and external.

General entities are basically used as substitutes for commonly-used segments of XML code. For example, here is an entity declaration that holds the copyright information for a company:. Parameter entities are both defined and referenced within DTDs. What this says is that each of the elements paragraph , intro , sidebar , and note can contain regular text as well as b , i , u , citation , and dialog elements. Not only does the use of a parameter entity reduce typing, it also simplifies maintenance of the DTD.

External entities point to external information that can be copied into your XML document at runtime. For example, you could include a stock ticker, inventory list, or other file, using an external entity. An external DTD is usually a file with a file extension of. First, you must edit the XML declaration to include the attribute. This will search for the letter.

If the DTD lives on a Web server, you might point to that instead:. Finally, XML Schema provides very fine control over the kinds of data contained in an element or attribute. Now, for some major drawbacks: Most of the criticism aimed at XML Schema is focused on its complexity and length.

Okay, now you know a lot more about DTDs than you did before. The first thing you do is you take a look at the dozens of corporate memos you and your colleagues have received in the past few months.

After a day or two of close examination, a pattern emerges. Although your first impulse might be to run out and create a sample XML memo document, please resist that urge for now. Because these memos are internal to the company, and there may be a need for a separate external memo DOCTYPE, you decide to use internalmemo as your root element name:.

The first element — the root element — is internalmemo. This element will contain all the other elements, which hold date, sender, recipient, subject line, and all other information. Because these represent a lot of elements, it would be useful to split your document into two logical partitions: The header will contain recipient, subject line, date, and other information. The body will contain the actual text of the memo. In DTD syntax, the above declaration states that our internalmemo element must contain one header element and one body element.

Next, we will indicate which elements these will contain. In DTD syntax, the above declaration states that the header element must contain single date , sender , and recipients elements, an optional blind-recipients element, and then a subject element. In DTD syntax, the above declaration states that the body element must contain one or more para elements, followed by a single sig element. Most of the other elements will contain plain text, except the para elements, in which we will allow bold and italic text formatting.

That was simple enough. Those pieces of information are hardly ever displayed on a document — they are used only for administrative purposes. In any case, we want to be able to control the data that document creators put in for values such as priority.

The best way to store these pieces of information is to add them as attributes to the root element. To do that, we need to add an attribute declaration to our DTD:. The result should look a lot like Figure 3. Do you see how, under Results, it reads No errors or warnings found.? In Dreamweaver MX , the results list for a valid document is simply empty, and the status bar beneath the list reads Complete.

What happens if some things are out of place? What would happen then? Notice that Dreamweaver MX tells you where the problem lies with a specific line number and provides a description of the problem.

The validator catches that too, as you can see in Figure 3. Figure 3. Error resulting from a misplaced element. Again, the validator gives you a line number and a description that can lead you to resolve the problem. All you need to do is put the sender element back in the prescribed order, and the document will validate once more. In that case, we embedded the DTD right into the file. You now have a reusable DTD that you can apply to other internal memos.

We now understand articles, news stories, binary files, and Web copy, and are well on our way to completing the requirements-gathering phase of the project — we can start coding soon! If you recall, we are tracking author, status, keyword, and other vital information in separate files. That is, each individual article, news story, binary file, and Web copy file keeps track of its own keywords, status, author, and dates. If we wanted to display all documents for a certain author, we would have to dig through all of our files to find all the matches.

Never fear — I have a proposal that will solve this problem. In fact, the rest of this chapter will be devoted to tackling this issue. With any luck, it will also give you some insights into the ways in which you can analyze requirements and come up with more architecturally sound XML designs.

The other problem is a little less obvious. To our application, these three names are different, and articles will thus be listed under three different authors.

To solve this problem, we should create a separate author listing authors. Once we have this figured out, we can get rid of the author element in all the other content types, and replace them with an authorid elements. Handling our authors this way also allows us to track other information about authors, such as their email addresses, their bylines in case they want to publish under pseudonyms , and other such information.

Instead of a separate author element, we would add an authorid element to our articles, like this: All we need to do is use this author ID in our articles, news stories, and all other content we add to our CMS; this ID is used to look up the author and retrieve the information we need.

The big question remains: To be completely honest, most articles, news stories, and such will be submitted to the site through our administrative tool.

This tool will have the necessary forms that will restrict data entry to certain fields. In other words, our administrative tool will do most of the work of validating our content. However, I think it would be good practice to develop a DTD for our article content type — after all, this is one of the most important document types we have in our system, and it has to be done right.

Although we have declared our body element to contain character data, our article bodies will indeed be formatted using HTML tags. Try writing DTDs for these as well. We used it to transform an XML letter to mother into something that could be displayed in a browser window.

XPath is used in a variety of applications and technologies, however, XSLT is where its power and versatility really shine. For all intents and purposes, XPath is a query language. It uses a simple notation that is very similar to directory paths hence the name XPath.

When we put together a template, we normally use XPath to establish a match. For example, we can always handle the root of an XML document like this:. With XPath, you can select all elements that have a particular tag name. Or, you could match certain elements depending on their location within an XML file.

As you can see, the basic XPath syntax looks a lot like a file path on your computer.

But you can go a step further and set conditions on which elements are matched within your specified path. These conditions are called predicates , and appear within square brackets following the element name you wish to set conditions for. The symbol identifies priority in this example as an attribute name, not a tag name.

XPath also has a number of useful functions built in. For example, if you need to grab the first or last element of a series, you can use XPath to do so. Although most practical applications are relatively simple, XPath can get quite twisty when it needs to be. The XPath Recommendation is quite a useful reference to these areas of complexity.

Book chapters provide an excellent opportunity to understand the arbitrary complexity of most XML documents. From the perspective of an XML document designer, however, a book chapter can be intimidatingly complex. Chapters can have titles and sections, and those sections can have titles.

There are paragraphs throughout — some belong to the chapter for example, introductory paragraphs , but others belong to sections. Sections can contain subsections. Paragraphs can contain text in italics, bold text, and other inline markup. In fact, one could even have different types of paragraphs, like notes, warnings, and tips. There are lots of possibilities for displaying these kinds of information.

This sample file could go on and on, but I think you get the idea. The first thing we want to do is to match the root of our document. Nothing could be simpler, right?

XML basics for new users

Viewed in a browser, our output will look something like that shown in Figure 4. Of course there is. The title element near the top of the document is the chapter title, and should be handled differently from the title elements in the different nested sections.

Likewise, para elements that denote warnings or introductions should be handled differently from other paragraphs. To distinguish between these different title types, you can use XPath notation.

Figure 4. What about the paragraphs? Unlike the titles, they are not distinguishable by their placement in the document alone. Instead, the document uses the type attribute to distinguish normal paragraphs from introductions, tips, and warnings. Luckily, XPath lets us specify matches based on attribute values, too. In XPath, we use a predicate a condition in square brackets to match an attribute value.

We should definitely take advantage of this ability and distinguish each of our paragraph types visually. We can also make sure that warnings are displayed in red text. Note the priority attribute on this template. By default, XSL templates have a priority between To make sure our introductory paragraphs will use this second template, we therefore assign a priority of 1. Example 4. How can we modify our template to display the actual chapter title in this spot instead?

When you need to pull a simple piece of information out of the XML document without messing around with templates to process the element s that house it, you can use a value-of element to grab what you want with an XPath expression:.

As you can see, the select attribute is an XPath expression that searches for the value of the title within the chapter. With value-of , we can print that value out.

Now our file displays something like the results shown in Figure 4. Notice the title bar of the browser window, which now contains the title of the chapter. Viewing the chapter example with XPath. In the preceding chapters, we gathered requirements for our XML files, administration tool, and display components. In this case, we do so because it lowers maintenance costs: Notice that the action is set to a file called doSearch. The most important page on the site is the homepage. The header will hold global navigation elements.

Like our search widget file, this navigation will be an include file — after all, we want to reuse these elements on other pages of the site. For the homepage of our site, the navigation menu will contain our search widget and a list of current news items. Our top navigation will be placed in an include file. Although a detailed look at SimpleXML will have to wait until??? Imagine, for example, that you had this very simple XML document:. Our first task is fairly simple: This code uses a regular expression to match the required file name pattern.

The method returns an array of elements that match the criteria specified; in this case that array will either contain a reference to the webcopy element in the file if the status is live , or it will be empty.

The file for our homepage will be called index. This file includes both the common. It then goes on to produce the secondary navigation and content divs navSide and mainContent , respectively. In this code, we open the file called homepage. The first include is the search widget that we built earlier on. For the most part, our news include file will be very similar in structure to the code we used in navtop.SAX is fast and efficient to implement, but difficult to use for extracting information at random from the XML, since it tends to burden the application author with keeping track of what part of the document is being processed.

Figure 1. In Figure 1 , your elements show up clearly when viewed within Internet Explorer. Take a look at the following snippet:. In our example, each person could define their own namespace and then prepend the name of their namespace to specific tags: Because this declaration must be first in the file, if you plan to combine smaller XML files into a larger file, you might want to omit this optional information.

This example contains Armenian text.

CYTHIA from Concord
I am fond of sharing PDF docs busily. Please check my other posts. I absolutely love pall mall.
>