<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Dynamic Publisher</title>
	<atom:link href="http://www.thedynamicpublisher.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.thedynamicpublisher.com</link>
	<description>Your Source for Dynamic Publishing News</description>
	<lastBuildDate>Thu, 17 May 2012 17:00:14 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Amazon CloudFront Offers Support For Dynamic Content</title>
		<link>http://www.thedynamicpublisher.com/2012/05/15/amazon-cloudfront-offers-support-for-dynamic-content/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=amazon-cloudfront-offers-support-for-dynamic-content</link>
		<comments>http://www.thedynamicpublisher.com/2012/05/15/amazon-cloudfront-offers-support-for-dynamic-content/#comments</comments>
		<pubDate>Wed, 16 May 2012 01:05:27 +0000</pubDate>
		<dc:creator>Scott Abel</dc:creator>
				<category><![CDATA[News]]></category>

		<guid isPermaLink="false">http://www.thedynamicpublisher.com/?p=1843</guid>
		<description><![CDATA[Amazon has announced that Amazon CloudFront now allows organizations to deliver personalized web content dynamically. Read the announcement. Check out the geeky details.]]></description>
			<content:encoded><![CDATA[<p>Amazon has announced that <a href="http://aws.amazon.com/cloudfront/">Amazon CloudFront</a> now allows organizations to deliver personalized web content dynamically. </p>
<p><a href="http://aws.amazon.com/about-aws/whats-new/2012/05/13/amazon-cloudfront-now-supports-dynamic-content/">Read the announcement</a>. Check out the <a href="http://aws.typepad.com/aws/2012/05/amazon-cloudfront-support-for-dynamic-content.html">geeky details</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.thedynamicpublisher.com/2012/05/15/amazon-cloudfront-offers-support-for-dynamic-content/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Migrating to DITA: How Automated Conversion Works and Why it Matters to You</title>
		<link>http://www.thedynamicpublisher.com/2012/05/03/migrating-to-dita-how-automated-content-conversion-works-and-why-it-matters-to-you/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=migrating-to-dita-how-automated-content-conversion-works-and-why-it-matters-to-you</link>
		<comments>http://www.thedynamicpublisher.com/2012/05/03/migrating-to-dita-how-automated-content-conversion-works-and-why-it-matters-to-you/#comments</comments>
		<pubDate>Thu, 03 May 2012 23:55:53 +0000</pubDate>
		<dc:creator>Scott Abel</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[Slider]]></category>
		<category><![CDATA[automated conversion]]></category>
		<category><![CDATA[content conversion]]></category>
		<category><![CDATA[content migration]]></category>
		<category><![CDATA[DITA]]></category>
		<category><![CDATA[dynamic content]]></category>
		<category><![CDATA[manual conversion]]></category>
		<category><![CDATA[structured content]]></category>
		<category><![CDATA[XML]]></category>

		<guid isPermaLink="false">http://www.thedynamicpublisher.com/?p=1777</guid>
		<description><![CDATA[by Patrick Baker, VP Development and Professional Services, Stilo International An unavoidable part of moving to the Darwin Information Typing Architecture (DITA), or any other structured authoring system, is converting your existing content into the new format. Most organizations that make the move to structured writing have to make the change while still continuing to [...]]]></description>
			<content:encoded><![CDATA[<p>by Patrick Baker, VP Development and Professional Services, <a href="http://www.stilo.com/">Stilo International</a></p>
<p>An unavoidable part of moving to the <a href="http://www.ibm.com/developerworks/xml/library/x-dita1/">Darwin Information Typing Architecture</a> (DITA), or any other structured authoring system, is converting your existing content into the new format. Most organizations that make the move to structured writing have to make the change while still continuing to meet their regular delivery schedules. This means you need to convert content and get it up and running correctly in the new system between two product cycles, and usually without much in the way of added staff or resources.</p>
<div id="attachment_1787" class="wp-caption alignright" style="width: 310px"><a href="http://www.thedynamicpublisher.com/wp-content/uploads/2012/05/detecting-patterns-content-conversion-e1336087340366.jpg" rel="shadowbox[sbpost-1777];player=img;"><img src="http://www.thedynamicpublisher.com/wp-content/uploads/2012/05/detecting-patterns-content-conversion-e1336087340366.jpg" alt="" title="Access code" width="300" height="199" class="size-full wp-image-1787" /></a><p class="wp-caption-text"> </p></div>
<p>Content conversion is a key part of your migration strategy and the quality and completeness of that conversion is essential to getting the migration done within the constraints of your schedule. However, content conversion is a bit of a black box for many people. It is hard for writers and managers to anticipate how difficult the conversion process is going to be, how long it is going to take, how much it is going to cost, and how much cleanup of the output is going to be required after the conversion is done. The purpose of this article is to lift the lid off that black box. In particular, understanding how automated conversion works will help you form a more reasonable expectation about how your own conversion project is going to go, and what you can do to make it go more smoothly.</p>
<p>No automated conversion is ever 100% clean, but the difference between an 80% clean conversion and one that is 95% clean is huge – it means a fourfold difference in cleanup costs. What makes the difference between 80% clean and 95% clean? Between 95% clean and 98% clean? Such outcomes certainly depend upon how well managed the conversion process is. However, having the right approach to content conversion is of critical importance.</p>
<h2>Knowledge is the key to intelligent content conversion</h2>
<p>There are three essential mechanisms that content conversion technology may leverage. They are:</p>
<ul>
<li>patterns</li>
<li>context</li>
<li>guided conversion</li>
</ul>
<p>When encoding content in a semantically rich format such as DITA, it is important to understand the meaning of the content in order to apply the correct tags. While people can understand the full meaning of the text they are reading, a computer does not, at least not very deeply. What a computer is exceedingly good at is recognizing patterns in the content. But patterns don’t provide the full solution. Patterns, when found in a given context, carry much more insight as to the meaning of the text in question. A sequence of 5 digits, for example, may represent a zip code, in the context of a US postal address, or an ICD-9 diagnostic code, in the health care sector. Guided conversion is supported by the provision of high-level mapping rules that hint at the current context so that patterns are interpreted correctly by the automated conversion tool. Compiling these hints depends on having an intimate familiarity with the document set destined for conversion. It is the content owners, armed with this content knowledge, who are best positioned to specify the mapping rules.</p>
<h3>Patterns</h3>
<p>Patterns are everywhere in content. Patterns occur both in the content itself, and in the file format that contains the content. The foundation of all content conversion tools is the ability to recognize patterns.People also use patterns to recognize things in content.</p>
<p>For instance, a reader will immediately recognize what these numbers mean based on their pattern:</p>
<ul>
<li>9/30/12</li>
<li>+1 (613) 745-4242</li>
<li>$65.12</li>
</ul>
<p>Software can recognize these patterns as well, so if your target format requires semantic markup for date, telephone numbers, or monetary amounts, a simple-pattern matching algorithm can find them and supply the markup. For example, a conversion program could recognize the sequence:</p>
<ul>
<li>“+” numbers space “(“ number*3 “)” space number*3 “-” number*4</li>
</ul>
<p>It can then capture each number sequence in this pattern and write it out using whatever XML format you choose for phone numbers, for instance:</p>
<ul>
<li>&lt;phone-number country=”1” area=”613” exchange=”745” number=”4242”/&gt;</li>
</ul>
<p>Of course, recognizing phone numbers is a bit more complicated than this. For one thing, people do not always include the country code when they write a phone number. People often omit the parentheses and the dash from the number, especially when the country code is used. This is one place where local knowledge of your content comes in – if you have a corporate style for phone numbers, you can tell your conversion software exactly what to look for. Otherwise, the conversion program can use multiple patterns to detect phone numbers in different formats.</p>
<p>Also, this pattern only works for North American phone numbers. Many other countries write their phone numbers differently. This is a case where we can use a context clue to improve our detection of phone numbers. For instance we can use the country code to determine which pattern to expect. The following pattern detects a UK phone number:</p>
<ul>
<li>“+44” numbers-and-spaces</li>
</ul>
<p>UK phone numbers use a different format from North America, so our original pattern will not detect them correctly. A conversion program can detect phone numbers as a two-step process. First you detect the country code to determine which country the number belongs to, then you select a pattern appropriate to the chosen country to fully analyse the number.</p>
<p>You can expect support for matching common patterns, such as phone numbers, to be built in to conversion software. However, it should be easy to extend the system with new patterns specific to the vocabulary of a particular domain.</p>
<h3>Context</h3>
<p>Patterns, though they are an indispensable part of automated conversion, cannot on their own address the challenge of imparting to the content the depth of meaning, or understanding, required for the intelligent application of semantic markup. This is where context comes in.</p>
<p>For example, consider a list in an <a href="http://www.adobe.com/products/framemaker.html">Adobe FrameMaker</a> document. In FrameMaker, while a table is a distinct type of object, a list is not. In FrameMaker, you create a list simply by adding a bullet or number style to a set of paragraphs. The result is something that looks like a list in the output. However the FrameMaker file format does not record the fact that the content is a list. The human eye can see the list in the output, but it is a little more challenging for a conversion program to figure out where a list begins and ends and what belongs to each item in a list.</p>
<p>Why does the conversion program have to figure out where the list begins and ends? Because most XML formats treat lists as distinct objects. When an XML document is styled, the style is generally applied to the list as a whole, rather than to the individual paragraphs in the list. This is usually the only way that an XML-based system provides for styling lists, so if the conversion software does not recognize the list in the source and create a proper XML list element in the output, chances are that the list will not be styled properly in the final output.</p>
<h2><strong>Example: a nested list</strong></h2>
<h3><strong>Quick-drop cookies</strong></h3>
<ol>
<li>Prepare the dough.<br />
a. Beat the egg in a large bowl.<br />
b. Add flour.<br />
c. Stir in milk.</li>
<li>Prepare the topping.<br />
a. Mix brown sugar and cinnamon in another bowl.</li>
<li>Form 1-inch round balls of dough.<br />
It is helpful to use a spoon when forming these balls.</li>
<li>Roll each ball in the topping.</li>
<li>Place each ball on an ungreased cookie sheet.</li>
</ol>
<p>Bake at 425 °F for 12 to 15 minutes.</p>
<p>This is the kind of construct that often occurs in complex procedures in technical documentation, the conversion program has to deal with multiple paragraphs within a single list item, as well as nested lists.</p>
<p>In this example, a paragraph that begins with a numeral indicates a first level list item, while a paragraph beginning with a letter indicates a nested, second level list item. An automated conversion should leverage this pattern to determine the logical nesting level of each item. Alternatively, it should identify nesting level by the indentation or styles that were used. Regardless, the conversion needs to track the current nesting level in order to ensure that the lists are properly opened and closed, and that each list item belongs to the correct list. For our example, this means emitting an opening &lt;ol&gt; each time we transition from an outer list item to a more deeply nested list item, and emitting a closing &lt;/ol&gt;  when transitioning in the other direction. The correct output is:</p>
<pre>&lt;p&gt;Quick-drop cookies&lt;/p&gt;
 &lt;ol&gt;</pre>
<pre style="padding-left: 30px;">&lt;li&gt;Prepare the dough.&lt;/li&gt;</pre>
<pre style="padding-left: 30px;"> &lt;ol&gt;</pre>
<pre style="padding-left: 60px;">&lt;li&gt;Beat the egg in a large bowl.&lt;/li&gt;</pre>
<pre style="padding-left: 60px;">&lt;li&gt;Add flour.&lt;/li&gt;</pre>
<pre style="padding-left: 60px;">&lt;li&gt;Stir in milk.&lt;/li&gt;</pre>
<pre style="padding-left: 30px;">&lt;/ol&gt;</pre>
<pre style="padding-left: 60px;">&lt;li&gt;Prepare the topping.&lt;/li&gt;</pre>
<pre style="padding-left: 30px;">&lt;ol&gt;</pre>
<pre style="padding-left: 60px;">&lt;li&gt;Mix brown sugar and cinnamon in another bowl.&lt;/li&gt;</pre>
<pre style="padding-left: 30px;">&lt;/ol&gt;</pre>
<pre style="padding-left: 30px;">&lt;li&gt;&lt;p&gt;Form 1-inch round balls of dough.&lt;/p&gt;</pre>
<pre style="padding-left: 60px;">&lt;p&gt;It is helpful to use a spoon when forming these balls.&lt;/p&gt;</pre>
<pre style="padding-left: 30px;">&lt;/li&gt;</pre>
<pre style="padding-left: 30px;">&lt;li&gt;Use a spoon to make 1-inch round balls of dough.&lt;/li&gt;</pre>
<pre style="padding-left: 30px;">&lt;li&gt;Roll each ball in the topping.&lt;/li&gt;</pre>
<pre style="padding-left: 30px;">&lt;li&gt;Place each ball on an ungreased cookie sheet.&lt;/li&gt;</pre>
<pre> &lt;/ol&gt;</pre>
<pre>&lt;p&gt;Bake at 425 degrees Fahrenheit for 12 to 15 minutes.&lt;/p&gt;</pre>
<p>Note that the list markers (1., 2., a., etc.) have been removed by the conversion.</p>
<h3><strong>Guided Conversion</strong></h3>
<p>So, how can we establish the appropriate context of a given piece of content? The most reliable authority on this is the content owner who is familiar with the content.  A mechanism is required which enables the content owner to easily express what the correct context is for any document content. This must be a high-level interface that does not require the user to be a programmer or technical expert.</p>
<h2><strong>Example: task steps</strong></h2>
<p>Upon further reflection, the markup provided by the previous example is not ideal. An improved DITA markup of these instructions for preparing the quick-drop cookies would use steps within a task topic. But, to target a semantically rich content model such as a DITA task, a conversion tool requires guidance. Such guidance may be provided by means of annotations attached to portions of the content, as illustrated in the table below.</p>
<div id="attachment_1797" class="wp-caption aligncenter" style="width: 606px"><a href="http://www.thedynamicpublisher.com/wp-content/uploads/2012/05/Screen-Shot-2012-05-03-at-4.49.42-PM.png" rel="shadowbox[sbpost-1777];player=img;"><img src="http://www.thedynamicpublisher.com/wp-content/uploads/2012/05/Screen-Shot-2012-05-03-at-4.49.42-PM.png" alt="" title="Table cookie baking instructions" width="596" height="285" class="size-full wp-image-1797" /></a><p class="wp-caption-text"> </p></div>
<p>The task title annotation can be based on the formatting properties of bold and underline. The annotation of step level 1 or 2 can be based on the presence of the list markers or the indentation level of the text. The tip might be recognized by the paragraph styling. The conversion should be smart enough to try to fit the last sentence into a task in a way that makes sense, in a way that is permitted by the DITA task content model. The elements &lt;result&gt;, &lt;example&gt; and &lt;postreq&gt; are good candidates. A preference can be set for the documentation set, and in this case &lt;postreq&gt; is the best choice.</p>
<p>Guided by these annotations, the conversion software should produce the following output:</p>
<pre>&lt;task&gt;</pre>
<pre style="padding-left: 30px;">&lt;title&gt;Quick-drop cookies&lt;/title&gt;
&lt;taskbody&gt;</pre>
<pre style="padding-left: 60px;">&lt;steps&gt;</pre>
<pre style="padding-left: 90px;">&lt;step&gt;</pre>
<pre style="padding-left: 120px;">&lt;cmd&gt;Prepare the dough.&lt;/cmd&gt;</pre>
<pre style="padding-left: 120px;">&lt;substeps&gt;</pre>
<pre style="padding-left: 150px;">&lt;substep&gt;&lt;cmd&gt;Beat the egg in a large bowl.&lt;/cmd&gt;</pre>
<pre style="padding-left: 150px;">&lt;/substep&gt;</pre>
<pre style="padding-left: 150px;">&lt;substep&gt;&lt;cmd&gt;Add flour.&lt;/cmd&gt;&lt;/substep&gt;</pre>
<pre style="padding-left: 150px;">&lt;substep&gt;&lt;cmd&gt;Stir in milk.&lt;/cmd&gt;&lt;/substep&gt;</pre>
<pre style="padding-left: 120px;">&lt;/substeps&gt;</pre>
<pre style="padding-left: 90px;">&lt;/step&gt;</pre>
<pre style="padding-left: 90px;">&lt;step&gt;</pre>
<pre style="padding-left: 120px;">&lt;cmd&gt;Prepare the topping.&lt;/cmd&gt;</pre>
<pre style="padding-left: 120px;">&lt;substeps&gt;</pre>
<pre style="padding-left: 150px;">&lt;substep&gt;</pre>
<pre style="padding-left: 180px;">&lt;cmd&gt;Mix brown sugar and cinnamon in another bowl.</pre>
<pre style="padding-left: 180px;">&lt;/cmd&gt;</pre>
<pre style="padding-left: 150px;">&lt;/substep&gt;</pre>
<pre style="padding-left: 120px;">&lt;/substeps&gt;</pre>
<pre style="padding-left: 90px;">&lt;/step&gt;</pre>
<pre style="padding-left: 90px;">&lt;step&gt;</pre>
<pre style="padding-left: 120px;">&lt;cmd&gt;Form 1-inch round balls of dough.&lt;/cmd&gt;</pre>
<pre style="padding-left: 120px;">&lt;info&gt;&lt;note type="tip"&gt;It is helpful to use a spoon when forming these balls.&lt;/note&gt;&lt;/info&gt;</pre>
<pre style="padding-left: 90px;">&lt;/step&gt;</pre>
<pre style="padding-left: 90px;">&lt;step&gt;&lt;cmd&gt;Roll each ball in the topping.&lt;/cmd&gt;&lt;/step&gt;</pre>
<pre style="padding-left: 90px;">&lt;step&gt;&lt;cmd&gt;Place each ball on an ungreased cookie sheet.&lt;/cmd&gt;</pre>
<pre style="padding-left: 90px;">&lt;/step&gt;</pre>
<pre style="padding-left: 60px;">&lt;/steps&gt;</pre>
<pre style="padding-left: 60px;">&lt;postreq&gt;Bake at 425 degrees Fahrenheit for 12 to 15 minutes.</pre>
<pre style="padding-left: 60px;">&lt;/postreq&gt;</pre>
<pre style="padding-left: 30px;">&lt;/taskbody&gt;</pre>
<pre>&lt;/task&gt;</pre>
<h3><strong>Typical problems to look out for </strong></h3>
<p>Here are some examples of the types of conversion issues that cause problems for conversion solutions that do not make full and integrated use of patterns, context, and guided conversion.</p>
<h3><strong>Multiple sets of steps within a task topic</strong></h3>
<p>A DITA task topic must contain only one procedure. However, many existing user guides are not written that way, and may have more than one procedure in a section. If you are converting sections into topics, and a section has more than one procedure, the conversion software needs to do something to produce valid output that includes both procedures.</p>
<p>Some control of context is required even to recognize that this problem exists. A conversion that depended solely on pattern matching would not even notice that it was creating an illegal second procedure. For a conversion tool to avoid this error, it has to be aware of the context of the procedure, not only in the input it is reading, but in the output it is creating.</p>
<p>Though the content cannot be automatically re-authored, the conversion software can insert an empty task &lt;title&gt; based on context, effectively breaking the topic into two tasks. This allows the conversion software to apply the semantically correct &lt;step&gt; and &lt;cmd&gt; markup to the content of the second procedure. The user still needs to provide the proper text for the title of the second procedure, post-conversion, but this is much quicker and easier, and less error-prone, than re-authoring the topic, either in the input or in DITA.</p>
<h3><strong>Procedures authored as a table</strong></h3>
<p>A number of organizations use tables to lay out the steps of a procedure. For a generic conversion program, this structure is going to look like a table, not a procedure, and the result will be that the content will come out as a table rather than a task in the DITA XML, which is not what you want.</p>
<p>Guided conversion can identify such tables based on, for example, the content of the first column (Step 1, Step 2 etc) or the header row, or possibly the table style.  The identified tables can be stripped of their table markup, and their contents automatically mapped into step commands, info, examples etc.  Again, the paragraphs can be identified based on the fact that they were contained in such a table, so there is no need to rely on styles.</p>
<p>Tables that contain definition lists, advisories, or any other content, can be similarly identified and stripped of their table markup.</p>
<h3><strong>Conditional text</strong></h3>
<p>Some conversion tools have trouble working with files that contain conditional text. Sometimes the tool requires that all conditions be turned on before conversion, and then they lose the conditions in the output.</p>
<p>Guided conversion should be used to specify a rule which indicates how different conditions in the source content map to XML. The conversion rule can target the DITA otherprops attribute, or a specialization of the props filtering attribute, for the capture of the conditional information. A guided conversion rule could also cause conditions of a specified type to lead to the creation of entries in the relationship table of the DITA map.</p>
<h3><strong>Constructing book and map files</strong></h3>
<p>While the aim of a conversion to DITA is to be able to reuse topics in many places, the first place you are probably going to want to use your converted topics is in the same book they came from. That means you will need a ditamap and/or bookmap that reproduces the structure of the converted book. Your conversion tool should be able to produce the required ditamap and bookmap for you.</p>
<p>Discerning the hierarchy is not always as simple as matching heading levels. Not every heading marks a change in hierarchy, and authors do not always use headings in strict hierarchical sequence. Additionally, different topic divisions may be indicated by the use of different heading types. Managing all of these issues requires sophisticated management of context informed by a detailed knowledge of the content and the style conventions that were used to create it.</p>
<p>Another important issue is discovering the book information such as publication date, document number, etc. For some organizations, this may involve the creation of a customized bookmap, if the standard DITA bookmap does not capture all of the publication information the organization uses.</p>
<p>This information in not always easy to find in the source files. No generic conversion software can ever accurately detect, extract, and preserve this publication information, since its format and location is always specific to an individual organization. However, with guided conversion pinpointing the location, pattern, and context of this information, a conversion tool can build the correct map.</p>
<p>In some cases, important metadata is found in the headers and footers rather than the main text flow of the document. Once again, guided conversion can pinpoint the data of interest and relate it correctly to ditamap and bookmap files you are building.</p>
<div id="attachment_1799" class="wp-caption alignright" style="width: 310px"><a href="http://www.thedynamicpublisher.com/wp-content/uploads/2012/05/Strategy-sign-e1336089205159.jpg" rel="shadowbox[sbpost-1777];player=img;"><img src="http://www.thedynamicpublisher.com/wp-content/uploads/2012/05/Strategy-sign-e1336089205159.jpg" alt="" title="Strategy Green Road Sign with Copy Room Over The Dramatic Clouds and Sky." width="300" height="199" class="size-full wp-image-1799" /></a><p class="wp-caption-text"> </p></div>
<h3><strong>Choosing your conversion strategy</strong></h3>
<p>Knowledge, as <a href="http://jgollner.typepad.com/files/the-anatomy-of-knowledge-jgollner-sept-2006.pdf">defined by Joe Gollner</a>:</p>
<p style="padding-left: 30px;"><em>“Knowledge is the meaningful organization of information, expressing an evolving understanding of a subject and establishing a basis for judgment and the potential for action.”</em></p>
<p>The level of success that an automated conversion technology can hope to achieve is bounded by the depth of knowledge it can attain of the content to be converted. Context, supported by guided conversion, provides for the <em>meaningful organization</em> of the <em>information</em> revealed by patterns. The conversion software can <em>act</em> on this <em>evolved understanding</em> of your content to produce the richest XML possible. Knowledge is the key to intelligent content conversion.</p>
<p>Because intimate familiarity with the content is so important to specifying the patterns and the context that will produce a high quality conversion that requires little cleanup, you probably don’t want to simply send your files away to be converted. Without your specialist knowledge to supply the patterns and context clues, the conversion you get back is going to be pretty generic, and that is going to mean you will have to do a lot of manual cleanup before the content is really usable.</p>
<p>On the other hand, the people with this knowledge are writers and editors in your organization, and they generally don’t know how to express these kinds of context clues in a programming language. Trying to learn to do conversion programming, so that you can write your own conversions that exploit your knowledge of the content, is going to be even more time consuming than cleaning up all the problems left by a generic conversion.</p>
<p>To get the best of both worlds, you need to work with a conversion service provider who understands the importance of patterns, context, and knowledge of the content in the conversion process, and who will work with you to define the conversion rules that will greatly improve the quality of your conversion output, and thus save you weeks or months of cleanup effort. You need a conversion service provider that possesses the intelligent conversion tools that allow you to capture and express all the context recognition rules in a high-level human-readable way, without the need for programming or technical expertise.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.thedynamicpublisher.com/2012/05/03/migrating-to-dita-how-automated-content-conversion-works-and-why-it-matters-to-you/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>How Dynamic Publishing Can Make Your Customers Love You</title>
		<link>http://www.thedynamicpublisher.com/2012/04/18/how-dynamic-publishing-can-make-your-content-less-expensive-to-create-and-make-your-customers-love-you/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=how-dynamic-publishing-can-make-your-content-less-expensive-to-create-and-make-your-customers-love-you</link>
		<comments>http://www.thedynamicpublisher.com/2012/04/18/how-dynamic-publishing-can-make-your-content-less-expensive-to-create-and-make-your-customers-love-you/#comments</comments>
		<pubDate>Wed, 18 Apr 2012 21:51:29 +0000</pubDate>
		<dc:creator>Noz Urbina</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[crowd-sourcing]]></category>
		<category><![CDATA[dynamic content]]></category>
		<category><![CDATA[dynamic publishing]]></category>
		<category><![CDATA[folksonomy]]></category>
		<category><![CDATA[taxonomy]]></category>
		<category><![CDATA[usability]]></category>

		<guid isPermaLink="false">http://www.thedynamicpublisher.com/?p=1551</guid>
		<description><![CDATA[by Noz Urbina, Senior Consultant, Trainer and Presales Manager for Mekon Ltd. Here&#8217;s the problem. Marketing and technical communication teams are overloaded. Faced with: Multiple delivery formats (print, web, online help, and now, myriad mobile formats) Increasing demand for translation Increasing demand for personalization and localization of content Shortening product release cycles with more product [...]]]></description>
			<content:encoded><![CDATA[<p>by Noz Urbina, Senior Consultant, Trainer and Presales Manager for Mekon Ltd.</p>
<p>Here&#8217;s the problem. Marketing and technical communication teams are overloaded.</p>
<p>Faced with:</p>
<ul>
<li>Multiple delivery formats (print, web, online help, and now, myriad mobile formats)</li>
<li>Increasing demand for translation</li>
<li>Increasing demand for personalization and localization of content</li>
<li>Shortening product release cycles with more product variants means more duplicate content need to be kept up-to-date</li>
<li>Completely new channels and content sources to tackle  (social media, syndication, user-generated content)</li>
</ul>
<p><a href="http://www.thedynamicpublisher.com/wp-content/uploads/2012/04/camel.jpg" rel="shadowbox[sbpost-1551];player=img;"><img class="alignright size-full wp-image-1600" title="Dromadaire" src="http://www.thedynamicpublisher.com/wp-content/uploads/2012/04/camel-e1334620470780.jpg" alt="" width="200" height="260" /></a>It seems we’re the camels, and the market has no shortage of straw to pile on our backs.  The solution can’t be “work harder”, because increases in communication budgets have not been proportionate to increasing demand on us. Because we can’t allocate additional resources to tackle these challenges, we must work smarter.</p>
<p>This means leveraging automation to create all these personalised deliverables with as little manual effort as possible. We just can’t keep up using outdated methods where we convert from format to format. Our users don’t want content on a website, or a microsite, or a portal, or PDF, or a mobile app &#8211; <em>they want it on them all</em>.</p>
<p>By dynamically assembling content on request, and automatically publishing it out in the format of the user’s preference, we can focus our precious human resource on the challenges that need humans, like information design, architecture and of course writing. For the grunt work of content assembly, layout, formatting, and publishing, use can leverage cheap, dynamic, automatic processes. This doesn’t just require new software or technology, but a new process and attitudes towards publishing so that it can be dynamically processed while still maintaining quality of output.</p>
<p>Going outside the comfort zone of some readers, I am going to illustrate some simple lessons that can be applied across various verticals and industries by using an ‘extreme case’ of advanced technical communications as an example. For nearly two decades, this field has been pioneering dynamic methodologies to address diverse customers faster and across various formats.</p>
<p><strong>Why Should We Look at TechComm</strong></p>
<p><a href="http://www.thedynamicpublisher.com/wp-content/uploads/2012/04/complexdocument.jpg" rel="shadowbox[sbpost-1551];player=img;"><img src="http://www.thedynamicpublisher.com/wp-content/uploads/2012/04/complexdocument-e1334784539880.jpg" alt="" title="Old geometry textbook" width="200" height="150" class="alignleft size-full wp-image-1657" /></a> We usually think of technical communication as ‘the manuals’ (which no one wants to read), but it is so much more. Smart techcomm focuses on facilitating the customer’s experience of the product or service that the published content supports. Techcomm publishers design personas, key messages and establish content strategies and workflows like any other type of publishing. Also like all publishers techcomm leverages social media and dynamic systems to give users what they want, when and where they want it.</p>
<p><strong>What Customers Want</strong></p>
<p>What customers want is the knowledge that is trapped in the heads (and on the hard drives) of product experts like engineers, technical communicators, trainers, and especially, their peers. Websites, reviews, manuals &#8212; even ‘content’ itself &#8212; are all a means to an end. Your customers want to be able to do something with your products. They may want to buy, evaluate, install, clean, use, repair, or decommission them. To do what they desire, they need you to  transfer the knowledge you have locked inside your walled gardens and give it to them &#8212; quickly, so they can get back to whatever it was they were trying to do before they needed your help.</p>
<p>The social media explosion of the past decade – which shows no signs of ebbing – has shown us that communities can produce a lot of content.  The questions remain: can we get them to produce and distribute usable, useful product knowledge, and if so, how can we leverage that knowledge?</p>
<p>* Or government body, or association.</p>
<p><strong>What Can The Crowd Do For You?</strong></p>
<p><a href="http://www.thedynamicpublisher.com/wp-content/uploads/2012/04/90-percent-10-percent.jpg" rel="shadowbox[sbpost-1551];player=img;"><img src="http://www.thedynamicpublisher.com/wp-content/uploads/2012/04/90-percent-10-percent-e1334784725839.jpg" alt="" title="Blue Pie Chart 10 - 90 percent" width="200" height="150" class="alignright size-full wp-image-1663" /></a><strong>They can write for you</strong></p>
<p>Usability and web specialist <strong>Jakob Nielsen</strong> has detailed the fact that <a href="http://www.useit.com/alertbox/participation_inequality.html">the vast minority of consumers will never become content producers</a>.  However, when dealing with large numbers of users with large numbers of demands, the contributions they make do not need to be substantial, they need only to be enough.  And users know it.</p>
<p>&#8220;Official&#8221; product content &#8212; that content created or published by the enterprise &#8212; sometimes has a reputation for being incomplete, difficult to navigate or otherwise unhelpful. In all my research and field experience, I’ve found most consumers don&#8217;t care who creates the content, as long as it helps them accomplish their goal.  As we’ll see below, the nature of the user contributions is also different, making them disproportionately valuable.</p>
<p><strong>The lesson:</strong> Make it easier for users to extend and add value to your content.  When publishing, your platform needs to allow users to easily contribute content. That content will also need to be structured and wrapped in as much metadata as possible. Your platform will need to help you sort and curate user contributions to find both good contributions and top contributors.</p>
<p>You find these by filtering the content based on metadata (a taxonomy).</p>
<p><strong>Users will build your taxonomy (and therefore links, filters and navigation) for you</strong></p>
<p>A taxonomy is a labeling and categorization system.  It defines data about your content (metadata), and lets you filter, search and relate content.</p>
<p>Users will tag their own content if properly encouraged (again, see <a href="http://www.useit.com/alertbox/participation_inequality.html">Nielsen’s advice</a> on making things easy to encourage participation), building up a <a href="http://www.nytimes.com/2005/12/11/magazine/11idea">‘folksonomy’</a> (a <a href="http://en.wikipedia.org/wiki/Taxonomy">taxonomy</a> built by ‘the folks&#8217; that use the content).</p>
<p>Often we think we must choose between a taxonomy or folksonomy.  The truth is that the two play very well together.  Look at Amazon:</p>
<p><a href="http://www.thedynamicpublisher.com/wp-content/uploads/2012/03/Screen-Shot-2012-03-29-at-8.09.21-AM.png" rel="shadowbox[sbpost-1551];player=img;"><img class="aligncenter size-full wp-image-1553" title="Amazon Tags Example" src="http://www.thedynamicpublisher.com/wp-content/uploads/2012/03/Screen-Shot-2012-03-29-at-8.09.21-AM.png" alt="" width="582" height="256" /></a></p>
<p>As we know, Amazon’s taxonomy of things like ‘Home electronics’ vs. ‘Books’ vs. ‘Home, Garden &amp; Tools’ is an integral and vital way of navigating a vast amount of content.  However, they know that their customers are always right, and that no classification and navigation system they provide will ever (on its own) equal what they can do if they leverage the power of the crowd. Amazon lets users add their own tags. And, they allow their users to search and navigate the site using those tags, in combination with the classification system the company provides.</p>
<p><a href="http://www.thedynamicpublisher.com/wp-content/uploads/2012/04/tag-cloud.jpg" rel="shadowbox[sbpost-1551];player=img;"><img src="http://www.thedynamicpublisher.com/wp-content/uploads/2012/04/tag-cloud-e1334784961625.jpg" alt="" title="Natural Disasters" width="200" height="129" class="alignleft size-full wp-image-1669" /></a>Leveraging user-generated tags is necessary because it helps solve several problems. First, tags created by users provide a wealth of information about our content. In order to build content classification systems of value, we must know what terms our customers use to describe our products, services and content. Second, because those who work for us creating official content are often resistant to adding metadata tags, leveraging the crowd to build folksonomies can help us provide the right content to the right people at the right time and in the right language. Search engines need this additional information to help customers find what they are looking for and dynamic publishing engines require tags to deliver content to those who need it, when and where they need it.</p>
<p><strong>The lesson:</strong> Make it possible for your users to build a folksonomy in all your online channels. Curate those terms and glean business-critical information from them. This means enabling user tagging on their own content &#8212; and on yours &#8212; and having clear guidelines indicating what user-generated terms may ‘graduate’ from the folksonomy into the official taxonomy.</p>
<p>This will enable analytics and reporting, which allow you to monitor what’s happening to and around your content.  With this metadata in your arsenal, you will be able to sort by products and subjects, locate top contributing users and top rated content.</p>
<p><strong>Users can know things you can’t </strong></p>
<p>User-generated content is created by users specifically for what users want and/or need.  Technical communicators try to do this as well, but users are in the privileged position of having the enterprise’s published content as a starting point.  They also have real-world field experience with your products before they start writing, an opportunity many enterprise staff will never have.</p>
<p><a href="http://www.thedynamicpublisher.com/wp-content/uploads/2012/04/bicyclerepair.jpg" rel="shadowbox[sbpost-1551];player=img;"><img src="http://www.thedynamicpublisher.com/wp-content/uploads/2012/04/bicyclerepair-e1334785141688.jpg" alt="" title="service for bike with adept repairing bike" width="200" height="133" class="alignright size-full wp-image-1673" /></a>For general publishing, often we find that when content ‘goes live’ we discover that it was not properly designed for users’ needs. Similarly, in techcomm, it is only when a user first puts the product and all its supporting content into use that the gaps are found.</p>
<p>In a recent content strategy audit, we looked at product content generated outside of the organization, specifically, at common user search queries coming in through the website, and questions and answers posted to forums. We found that the users were getting lots of facts, but not the task-based and conceptual overview information they were looking for.</p>
<p>By often being derived from engineering specifications and product management documents, technical communication content is often excessively reference-based, giving people more of a ‘dictionary of product data’, rather than true ‘user guides’ or ‘how to manuals’.</p>
<p><strong>The lesson:  </strong>When dynamic methods are not in place, staff get overloaded just trying to hit deadlines and prepare reviewed, nicely formatted deliverables.</p>
<p>All publishers need to plan to learn from users, and promote their work.  Time is money. Any financial adviser will tell you to always put away 10% at the beginning of the month, no matter how little you have left.  You’ll find a way to make do with the 90%, and be far better off in the long term.  Plan a percentage of time at the beginning of every project to research the community’s needs by watching their output.</p>
<p><a href="http://www.thedynamicpublisher.com/wp-content/uploads/2012/03/Screen-Shot-2012-03-29-at-8.10.13-AM.png" rel="shadowbox[sbpost-1551];player=img;"><img class="aligncenter size-full wp-image-1554" title="Twitter" src="http://www.thedynamicpublisher.com/wp-content/uploads/2012/03/Screen-Shot-2012-03-29-at-8.10.13-AM.png" alt="" width="560" height="150" /></a></p>
<p>You can &#8212; and should &#8212; have a process in place for finding, capturing, validating, and reworking users’ knowledge until it fits alongside ‘official’ content.</p>
<p><strong>Pulling it all together</strong></p>
<p>By enabling user creation and tagging of content, and building analysis and curation of user content into your process, you will be able to leverage more content than you could ever have the capacity to produce yourself.</p>
<p>XML (oftentimes of the DITA flavor) will be required to make all of this a reality. XML authoring environments make it possible to create semantically-enabled content that can go anywhere, and take its metadata with it. As such, the platform you choose will likely be an XML-based system and require a rethink of information design and editorial processes to enable dynamic publishing of the end deliverables.</p>
<p>By leveraging your folksonomy and taxonomy, you can dynamically create related links from your content to user-generated content and back again. To illustrate, think about how when <a href="http://www.reuters.com/">Reuter</a>’s sells news on “Middle East oil fields” to a 3<sup>rd</sup> party news site, that 3<sup>rd</sup> party site can automatically link that content to its other “Middle East” and “Oil” content not sourced from Reuters – simply by matching up the metadata tags. It’s a simple feature of dynamic systems, but highly impactful.</p>
<p>If the users have knowledge and the will to share it, saving us time and making them happier in the process, then it seems only fitting that we help them do so.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.thedynamicpublisher.com/2012/04/18/how-dynamic-publishing-can-make-your-content-less-expensive-to-create-and-make-your-customers-love-you/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Cleaning Up eBook Conversion Messes: Tips For Success</title>
		<link>http://www.thedynamicpublisher.com/2012/03/28/cleaning-up-ebook-conversion-messes-tips-for-success/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=cleaning-up-ebook-conversion-messes-tips-for-success</link>
		<comments>http://www.thedynamicpublisher.com/2012/03/28/cleaning-up-ebook-conversion-messes-tips-for-success/#comments</comments>
		<pubDate>Wed, 28 Mar 2012 16:44:31 +0000</pubDate>
		<dc:creator>Mark Gross</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[content conversion]]></category>
		<category><![CDATA[Digital Publishing]]></category>
		<category><![CDATA[ebooks]]></category>
		<category><![CDATA[EPUB]]></category>
		<category><![CDATA[PDF]]></category>
		<category><![CDATA[XML]]></category>

		<guid isPermaLink="false">http://www.thedynamicpublisher.com/?p=1518</guid>
		<description><![CDATA[By Mark Gross, with Devorah Bloom, DCL In my previous column, Understanding Content Conversion: Unfortunately, There’s No ‘Easy’ Button, I examined the various areas that trip up the eBook conversion process –special characters, tables, hyphenations, and so on.  This column follows on that theme, based on a question from a consultant with a major consulting firm: [...]]]></description>
			<content:encoded><![CDATA[<p><strong>By Mark Gross, with Devorah Bloom, DCL</strong></p>
<p>In my previous column, <a href="http://www.thedynamicpublisher.com/2012/01/17/understanding-content-conversion-unfortunately-there’s-no-‘easy’-button/">Understanding Content Conversion: Unfortunately, There’s No ‘Easy’ Button</a>, I examined the various areas that trip up the eBook conversion process –special characters, tables, hyphenations, and so on.  This column follows on that theme, based on a question from a consultant with a major consulting firm:</p>
<blockquote><p>“…I am researching the time it takes on average to convert from <a href="http://www.adobe.com/pdf/">PDF</a> to <a href="http://idpf.org/epub/30">EPUB</a>, including the time it takes to edit the ‘rough’ EPUB file created by the automated conversion to clean up the formatting errors, resulting in a ‘clean’ EPUB suitable for display on an eReader device. I am having trouble locating such a statistic.”</p></blockquote>
<p>Why is it so difficult to find this information, you ask?  That&#8217;s easy. Because there isn’t one specific answer. The time it takes to correct an automated conversion (cleanup errors) depends on many factors:</p>
<ol>
<li>complexity of content in the source document</li>
<li>type of source file</li>
<li>completeness of the automated converter</li>
<li>post-conversion, manual cleanup</li>
<li>skill level of the people doing the work</li>
<li>how clean do you want it</li>
</ol>
<p>&nbsp;</p>
<p><strong><a href="http://www.thedynamicpublisher.com/wp-content/uploads/2012/03/simpleeReader-e1332880203465.jpg" rel="shadowbox[sbpost-1518];player=img;"><img class="alignright size-full wp-image-1524" title="E-Book Reader with Novel on Screen" src="http://www.thedynamicpublisher.com/wp-content/uploads/2012/03/simpleeReader-e1332880203465.jpg" alt="" width="225" height="225" /></a>Complexity</strong></p>
<p>Most novels are easier to clean up than multi-column textbooks or complex technical content containing formatted tables, images, poetry, sidebars, and footnotes. Complexity increases as the number of complex elements in need of being converted are reconfigured to fit the often much smaller areas of screen real estate common on smartphones, tablets and eBook readers.  Even novels are not that simple to convert, as discussed in Devorah Bloom’s webinar, <a href="http://www.dclab.com/learning_series/20110519_automated_ebook_conversion.asp">What to Expect from Automated Conversion to eBook</a>, but they are usually quicker to do than complex scientific articles with math, chemistry, and all that stuff.</p>
<p><strong>Source file</strong></p>
<p>The format of your source content will impact how long it will take to prepare your materials for conversion. It’s important to carefully consider all sources available. It may be obvious that converting from paper will be the most costly, as proofreading will be necessary to make sure everything is correct, but even with electronic files, problems can be expected. PDFs, for instance, are the most common type of source file. They come in many variations, and while some are much better sources than others, common problems introduced in PDF files include:</p>
<ul>
<li>word spacing</li>
<li>paragraph delineation</li>
<li>hyphens</li>
<li>emphasis</li>
<li>special characters.</li>
</ul>
<p>Word processing files, proprietary software formats, standard file types (XML, HTML, etc), and pretty much every other type of file, will also introduce challenges.</p>
<p><strong><a href="http://www.thedynamicpublisher.com/wp-content/uploads/2012/03/bookslettersexplodingout.jpg" rel="shadowbox[sbpost-1518];player=img;"><img class="alignright size-full wp-image-1528" title="bookslettersexplodingout" src="http://www.thedynamicpublisher.com/wp-content/uploads/2012/03/bookslettersexplodingout-e1332880339794.jpg" alt="" width="225" height="310" /></a>Conversion software</strong></p>
<p>Results from automated conversion software (and automated conversion scripts) vary widely, but in most cases if the software you opt to use isn’t tuned to your content, the conversion will be rough. Rough conversions require serious clean-up. It&#8217;s almost always best to invest the effort up front to tune the software to the content being converted.</p>
<p>Sometimes, tuning isn&#8217;t enough. Some commercial content conversion software (this includes freeware) is overly strict, lacking in the flexibility department. While conversion rules are necessary, you&#8217;ll want to use tools that provide the flexibility needed to handle all the situations you&#8217;re likely to encounter when converting source content that isn&#8217;t as clean as you’d like.</p>
<p>While writers don&#8217;t maliciously introduce problems into the documents they create, they are the source of many conversion challenges. It&#8217;s not their fault, actually. They lack an understanding of how their actions create conversion problems and have never been equipped with the knowledge (or the tools) needed to produce easy-to-convert documents. But, knowing this fact will make it easier for you to select the approach that works best for your organization.</p>
<p>One approach is to take whatever content you have been provided and work to clean it up, manually.</p>
<p>Another approach is to use commercial conversion software to make a first pass. When problems crop up (and they will), go back and modify the original documents so that they fit the software’s expectations. While this approach is workable, it&#8217;s time-consuming and expensive.</p>
<p>A third approach (that is particularly useful on large document sets) is to work with a firm that specializes in content conversion. Look for a company that has developed conversion software which is designed to be continually adjusted (tuned) to meet new needs. This approach will allow you to continually leverage the power of the conversion software to do as much of the work as possible. At DCL, we use this approach and we do so because every tiny accuracy improvement we make pays tremendous dividends in the clean-up phase.</p>
<div class="mceTemp"><strong>Cleanup</strong></div>
<p>This is where the big variances enter the equation. If the conversion process worked effectively, this phase would just a review phase and would go very quickly. However, if the conversion is rough, having left behind a lot of debris, it takes longer since you have to find and fix things, and some of those things are difficult and time-consuming to fix by hand. If you’re still intent on doing all this yourself, you should test the results of the conversion on a small, representative sample of your content to better understand what’s involved.</p>
<div class="mceTemp"><strong>Review</strong></div>
<p>After cleanup, everything has to be reviewed. It&#8217;s a necessary step that far too many people skip, leading to content quality problems. Additional, device-specific review and testing will need to be conducted if you&#8217;re outputting your content to multiple device types. This step is not intended to clean up errors, but rather to ensure everything worked well. It&#8217;s an important task, not to be relegated to clerical staff. Instead, it&#8217;s best to be conducted by folks who understand the content, the audience , and the devices on which the content will be displayed.</p>
<p><a href="http://www.thedynamicpublisher.com/wp-content/uploads/2012/03/qualitymanagement-e1332881036671.jpg" rel="shadowbox[sbpost-1518];player=img;"><img class="alignright size-full wp-image-1536" title="Quality management" src="http://www.thedynamicpublisher.com/wp-content/uploads/2012/03/qualitymanagement-e1332881036671.jpg" alt="" width="250" height="167" /></a>Review is more than just comparing the original copy to the final result. Because changes are introduced in the eBook versions that were not part of the source file (text is repositioned to support device screen size and orientation, for example), it’s important that reviewers are equipped with the knowledge necessary to know what to check for and what to ignore.</p>
<p><strong>How clean do you want it?</strong></p>
<p>In the traditional book publishing world, perfection was the standard, but that seems to have changed with the rush to get eBooks to market – especially with short run books that need to get out quickly. While a medical text requires checking, double-checking, and triple-checking, other kinds of books might be acceptable with the occasional extraneous hyphen and bullets that don’t wrap exactly right. I’m a little old-fashioned on this, and prefer the perfection approach, but I do recognize that there are short-cuts that some may feel comfortable taking.</p>
<p><strong>Conclusion</strong></p>
<p>So the short answer to the question of how long it should take to produce clean eBook content is based on a number of factors. Each of these variables contributes to the total amount of time you&#8217;ll need to spend on correcting an automated PDF to EPUB conversion; it may be 3-4 hours, but it can take also 3-4 days &#8212; or longer. It all depends&#8230;</p>
<div></div>
]]></content:encoded>
			<wfw:commentRss>http://www.thedynamicpublisher.com/2012/03/28/cleaning-up-ebook-conversion-messes-tips-for-success/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>What is XML Really About?</title>
		<link>http://www.thedynamicpublisher.com/2012/03/06/what-is-xml-really-about/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=what-is-xml-really-about</link>
		<comments>http://www.thedynamicpublisher.com/2012/03/06/what-is-xml-really-about/#comments</comments>
		<pubDate>Tue, 06 Mar 2012 15:15:33 +0000</pubDate>
		<dc:creator>Mark Baker</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[content as a database]]></category>
		<category><![CDATA[data structure]]></category>
		<category><![CDATA[Markup language]]></category>
		<category><![CDATA[metadata]]></category>
		<category><![CDATA[semantic markup]]></category>
		<category><![CDATA[semantic web]]></category>
		<category><![CDATA[standardized content]]></category>
		<category><![CDATA[structured content]]></category>
		<category><![CDATA[XHTML]]></category>
		<category><![CDATA[XML]]></category>
		<category><![CDATA[XSLT]]></category>

		<guid isPermaLink="false">http://www.thedynamicpublisher.com/?p=1365</guid>
		<description><![CDATA[by Mark Baker We all know that XML is a good thing. The pundits and the vendors all tell us so. XML opens up many possibilities for content. But exactly how (and why) it does so is not always made clear. Getting your content into XML is not, by itself, enough to deliver any or [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.thedynamicpublisher.com/wp-content/uploads/2012/03/XML.jpg" rel="shadowbox[sbpost-1365];player=img;"><img class="alignright size-full wp-image-1474" title="XML" src="http://www.thedynamicpublisher.com/wp-content/uploads/2012/03/XML-e1330810260125.jpg" alt="" width="200" height="200" /></a>by Mark Baker</p>
<p>We all know that <a href="http://www.w3.org/XML/">XML</a> is a good thing. The pundits and the vendors all tell us so. XML opens up many possibilities for content. But exactly how (and why) it does so is not always made clear. Getting your content into XML is not, by itself, enough to deliver any or all of the things that are promised for it. To understand why, we need to look at what XML is really about.</p>
<p>I’m going to assume your know the basic mechanics of XML and that you recognize elements, attributes and the tags that define them. I’m assuming therefore that you recognize that the sample below, which is an excerpt from an <a href="http://en.wikipedia.org/wiki/XHTML">XHTML</a> document, contains a <strong>&lt;p&gt;</strong> element with two <strong>&lt;i&gt;</strong> <a href="http://www.w3.org/TR/xhtml2/mod-structural.html">elements</a> inside it, and that these elements are defined using <a href="http://www.webreference.com/xml/reference/xhtml.html">tags</a>, which are the things inside the angle brackets:</p>
<p>&lt;p&gt;&lt;i&gt;War and Peace&lt;/i&gt; is a &lt;i&gt;very&lt;/i&gt; long book.&lt;/p&gt;</p>
<p>These tags are metadata. When people talk about metadata, they often think of it only as a label attached to a piece of content, usually by a content management system, and used to track the document and make it easier to find. That is certainly one important use of metadata, but metadata is much broader than that. Metadata means data that describes other data. There are many useful forms of metadata, including XML tags. (For more on this, see <a href="http://everypageispageone.com/2011/05/20/the-meaning-of-metadata/">The Meaning of Metadata</a>).</p>
<p><a href="http://www.thedynamicpublisher.com/wp-content/uploads/2012/03/War-and-Peace-book.jpg" rel="shadowbox[sbpost-1365];player=img;"><img class="alignleft size-full wp-image-1478" title="War and Peace" src="http://www.thedynamicpublisher.com/wp-content/uploads/2012/03/War-and-Peace-book-e1330815636512.jpg" alt="" width="150" height="225" /></a>The reason that XML is so useful is that it allows us to assign metadata to content at any level of granularity, not just to a document as a whole, but to an individual sentence or an individual word or phrase. In the sample above, the <strong>&lt;i&gt;</strong> tag tells us that the string “War and Peace&#8221; should be rendered in <em>italics</em>. (This is what <strong>&lt;i&gt;</strong> means in XHTML. An <strong>&lt;i&gt;</strong> tag could mean something completely different in another tagging language.)</p>
<p>Of course, “render this in italics&#8221; is not the most sophisticated piece of metadata in the world. Also, it seems to violate the oft-cited rule that XML separates content from formatting. Actually, that oft-cited rule is wrong. Some XML tagging languages, such as XHTML and <a href="http://www.w3.org/TR/xsl/">XSL-FO</a> are designed specifically to <strong>apply</strong> formatting to content. Some XML languages are designed to separate content from formatting. Others have nothing to do with content at all. A much better rule would be to say that XML allows you to apply metadata to content, and that the metadata you apply can express just about anything you want it to.</p>
<p>So, XHMTL does attach formatting to content, and an XHTML processing application (like, say, a web browser) would render our sample content like this:</p>
<p><em>War and Peace</em> is a <em>very</em> long book.</p>
<p>That’s fine, as far as it goes, but XML can enable us to do a lot more.</p>
<h2>Separating text from formatting</h2>
<p>If we do want to separate content from formatting, we need to add some metadata to create the separation. XHTML provides a way for us to take the first step along that road by using the <strong>&lt;em&gt;</strong> (emphasis) tag rather than the <strong>&lt;i&gt;</strong> tag:</p>
<p>&lt;p&gt;&lt;em&gt;War and Peace&lt;/em&gt; is a &lt;em&gt;very&lt;/em&gt; long book.&lt;/p&gt;</p>
<p>Rather than specifying that “War and Peace&#8221; and “very&#8221; be printed in italics, this markup simply says that they are to be emphasised. It leaves it up to the processing application to decide how to emphasized them. For instance, it could do this:</p>
<p><span style="color: red;"><strong>War and Peace</strong></span> is a <span style="color: red;"><strong>very</strong></span> long book.</p>
<p>But there is a problem here. The processing application has every right to print the emphasized content in <span style="color: red;"><strong>bold red text</strong></span>, because making text bold and red certainly does emphasize it. But there are conventions for how you show the title of a book, and that convention is that it should be rendered in <em>italics</em>.</p>
<p>If we are going to separate content from formatting, therefore, we had better do it properly. If we just use <strong>&lt;em&gt;</strong> as a synonym for <strong>&lt;i&gt;</strong>, then we are not actually separating the content from the formatting and we would be better served to stick to <strong>&lt;i&gt;</strong> since it is actually a better way of saying what we mean.</p>
<p>If we truly want to separate content from formatting, we had better find a more discriminating way to go about it than simply replacing <strong>&lt;i&gt;</strong> with <strong>&lt;em&gt;</strong> everywhere. If we are not going to format the text directly, then we need to give the processing application enough metadata that it can distinguish things that ought to be formatted differently.</p>
<div>
<p>In order to give the processing application enough information to format “War and Peace” correctly, we need to provide metadata that says that the string “War and Peace” is a title:</p>
<p>&lt;p&gt;&lt;title&gt;War and Peace&lt;/title&gt; is a &lt;em&gt;very&lt;/em&gt; long book.&lt;/p&gt;</p>
</div>
<p>Now we have provided enough information for an XML processor to render the sentence appropriately:</p>
<p><em>War and Peace</em> is a <span style="color: red;"><strong>very</strong></span> long book.</p>
<div>
<p>However, in adding the <strong>&lt;title&gt;</strong> tag to our tagging language, we have moved away from XHTML. XHTML does support a <strong>&lt;title&gt;</strong> tag, but it uses it inside the <strong>&lt;head&gt;</strong> element to capture the title of the current document. It does not support the use of <strong>&lt;title&gt;</strong> inside the <strong>&lt;p&gt;</strong>element for marking up the titles of books.If we are no longer using XHTML, what are we using? We are now using a tagging language of our own invention. I’ll call it <strong>YAMLX </strong>(Yet Another Markup Language eXample). We are beginning to capture metadata that is specific to our business. That means that we have to start taking responsibility for our own markup design, and either find an existing markup language that provides the metadata we need, or create one ourselves. For purposes of this article, it doesn’t matter whether YAMLX is a publicly available language or one you create yourself. What matters is that YAMLX captures the metadata you need to run your content process efficiently.</p>
<p><a href="http://www.thedynamicpublisher.com/wp-content/uploads/2012/03/CSS.jpg" rel="shadowbox[sbpost-1365];player=img;"><img class="alignright size-full wp-image-1480" title="CSS" src="http://www.thedynamicpublisher.com/wp-content/uploads/2012/03/CSS-e1330816246994.jpg" alt="" width="150" height="120" /></a>Of course, since YAMLX is not XHTML, web browsers will not understand it directly. In order to publish it, you will need to process it in some way so that browsers will know how to render it. There are a couple of ways to do this. One is to create a <a href="http://www.w3schools.com/css/css_intro.asp">CSS style sheet</a> to tell the browser how to display YAMLX elements, and another is to create an <a href="http://www.w3.org/TR/xslt11/">XSLT script</a> to convert YAMLX to HTML. Some of the stuff we are going to add to YAMLX later will move us in the direction of using XSLT, so that is what we will look at here.</p>
<p>Fortunately, YAMLX (so far) only differs from XHTML in one tag, so we can write an XSLT template to convert a <strong>&lt;title&gt;</strong> element occurring inside a <strong>&lt;p&gt;</strong> element to a valid XHTML tag and just copy the rest over. Here’s the template that converts the <strong>&lt;title&gt;</strong> element to an <strong>&lt;i&gt;</strong> element:</p>
<p>&lt;xsl:template match=&#8221;p/title&#8221;&gt;<br />
&lt;i&gt;<br />
&lt;xsl:apply-templates/&gt;<br />
&lt;/i&gt;<br />
&lt;/xsl:template&gt;</p>
<p>Don&#8217;t let this markup confuse you. It&#8217;s very simple. It says to the content rendering engine, if you see a <strong>&lt;title&gt;</strong> element inside a <strong>&lt;p&gt;</strong> element, output an <strong>&lt;i&gt;</strong> element in its place. It doesn’t matter if you get exactly how this works, but as you can tell, it isn’t rocket science.</p>
<h2 dir="ltr">Semantic markup</h2>
<p><a href="http://www.thedynamicpublisher.com/wp-content/uploads/2012/03/Screen-Shot-2012-03-03-at-11.24.37-PM.png" rel="shadowbox[sbpost-1365];player=img;"><img class="alignleft size-full wp-image-1488" title="1400079985 Barcode for War and Peace" src="http://www.thedynamicpublisher.com/wp-content/uploads/2012/03/Screen-Shot-2012-03-03-at-11.24.37-PM-e1330817124787.png" alt="" width="200" height="150" /></a>Okay, so we have separated text from formatting, and made a distinction between titles and general emphasis so that they can be formatted differently, but the only thing we can really do with this markup is apply formatting to it. Separating content from formatting isn’t exactly a productivity revolution if all you are going to do is slap them back together again. To get any real benefit from the separation, we need to do more. So lets update YAMLX to capture some more useful metadata:</p>
<p>&lt;p&gt;&lt;title isbn=&#8221;1400079985&#8243;&gt;War and Peace&lt;/title&gt; is a &lt;em&gt;very&lt;/em&gt; long book.&lt;/p&gt;</p>
<p>Here we have added some more metadata to the <strong>&lt;title&gt;</strong> tag in the form of an <strong>isbn</strong> attribute. With this additional metadata, the markup does not merely identify “War and Peace” as a title, it identifies it as the title of a <a href="http://openisbn.com/isbn/1400079985/">particular work</a>.</p>
<p>What can we do with this additional metadata? An <a href="http://www.isbn.org/standards/home/about/index.html">ISBN number</a> is the key to a large amount of data about a published book. If we have the ISBN number, we can <a href="http://www.google.com/search?client=safari&amp;rls=en&amp;q=067003469X&amp;ie=UTF-8&amp;oe=UTF-8#hl=en&amp;client=safari&amp;rls=en&amp;sclient=psy-ab&amp;q=1400079985&amp;pbx=1&amp;oq=1400079985&amp;aq=f&amp;aqi=&amp;aql=&amp;gs_sm=3&amp;gs_upl=42287l42287l0l42430l1l1l0l0l0l0l0l0ll0l0&amp;gs_l=serp.3...42287l42287l0l42431l1l1l0l0l0l0l0l0ll0l0&amp;bav=on.2,or.r_gc.r_pw.r_cp.r_qf.,cf.osb&amp;fp=cd42fa2092e2aba0&amp;biw=1343&amp;bih=830">look up all sorts of other information</a>. For instance, we can use the ISBN to look up publication details using a web service like <a href="http://www.isbn.org/standards/home/index.asp">ISBNdb</a>.</p>
<p>Most web services return information in XML, which is perfect for us, since our content is in XML. A hypothetical ISBN web service might return an XML document that looked like this (this is not what ISBNdb returns, just a simplified example):</p>
<p>&lt;book&gt;<br />
&lt;isbn&gt;1400079985&lt;/isbn&gt;<br />
&lt;title&gt;War and Peace&lt;/title&gt;<br />
&lt;author&gt;Leo Tolstoy&lt;/author&gt;<br />
&lt;publisher&gt;Vintage&lt;/publisher&gt;<br />
&lt;publication-year&gt;2008&lt;/publication-year&gt;<br />
&lt;page-count&gt;1296&lt;/page-count&gt;<br />
…<br />
&lt;/book&gt;</p>
<p>We could then pull pieces from that XML document to add to our own content, thus allowing us to produce output like this:</p>
<p><em>War and Peace</em> (Leo Tolstoy, Vintage, 2008, 1296 pages) is a <span style="color: red;"><strong>very</strong></span> long book.</p>
<p>Of course, we don’t do this by hand. We use a script to do it. Just to demonstrate that this is not rocket science either, here is a snippet of XSLT code that does this:</p>
<p>&lt;xsl:template match=&#8221;p/title&#8221;&gt;<br />
&lt;!&#8211; capture the isbn number to look up &#8211;&gt;<br />
&lt;xsl:variable name=&#8221;isbn&#8221; select=&#8221;@isbn&#8221;/&gt;</p>
<p>&lt;!&#8211; call the web service to get book info using the isbn &#8211;&gt;<br />
&lt;xsl:variable name=&#8221;book-info&#8221; select=&#8221;document(concat(&#8216;http://example.com/isbn/lookup?&#8217;, $isbn))&#8221;/&gt;</p>
<p>&lt;!&#8211; output the book title &#8211;&gt;<br />
&lt;i&gt;<br />
&lt;xsl:apply-templates/&gt;<br />
&lt;/i&gt;</p>
<p>&lt;!&#8211; output the additional book info &#8211;&gt;<br />
&lt;xsl:text&gt; (&lt;xsl:text&gt;<br />
&lt;xsl:value-of select=&#8221;$book-info/book/author&#8221;/&gt;<br />
&lt;xsl:text&gt;, &lt;xsl:text&gt;<br />
&lt;xsl:value-of select=&#8221;$book-info/book/publisher&#8221;/&gt;<br />
&lt;xsl:text&gt;, &lt;xsl:text&gt;<br />
&lt;xsl:value-of select=&#8221;$book-info/book/publication-year&#8221;/&gt;<br />
&lt;xsl:text&gt;, &lt;xsl:text&gt;<br />
&lt;xsl:value-of select=&#8221;$book-info/book/page-count&#8221;/&gt;<br />
&lt;xsl:text&gt;) &lt;xsl:text&gt;<br />
&lt;/xsl:template&gt;</p>
<p><a href="http://www.thedynamicpublisher.com/wp-content/uploads/2012/03/war-and-peace-book-cover-e1330816567390.jpg" rel="shadowbox[sbpost-1365];player=img;"><img class="alignright size-full wp-image-1484" title="War and Peace book cover, Vintage Press" src="http://www.thedynamicpublisher.com/wp-content/uploads/2012/03/war-and-peace-book-cover-e1330816567390.jpg" alt="" width="150" height="222" /></a>Again, not rocket science, but this basic technique opens all kinds of doors. The publication information was not in the original XML. It was pulled from another source using metadata captured in the XML. The power of semantic markup to enable the merging of information from different sources is enormous. Here are just a few of the tricks we could pull using information retrieved using the ISBN number:</p>
<ul>
<li>Pull in a picture of the book cover.</li>
<li>Create a link to an article on War and Peace on your website.</li>
<li>Create a link to an online bookstore, where the the reader could buy the book. If you belonged to an affiliate program for an online bookstore, you could pick up some cash every time a reader followed your link and bought a book. Now the ISBN number metadata has turned a casual reference into a potential source of revenue.</li>
</ul>
<h2 dir="ltr">Making authors more efficient</h2>
<p>There are also some major process efficiencies to be realized by capturing this kind of metadata in your XML content. If you can use metadata keys to pull information from external sources, authors don’t have to look up all that information themselves when they write. Authors don’t have to decide which of the book details are going to appear in the final output. That decision is made by editing the XSLT stylesheet, and it can be changed, for all your existing content, simply by changing the stylesheet.</p>
<p>As you can see, inserting one simple piece of metadata into our XML, lets us save a lot of time when authoring, and leaves all our options open as to which details will be published. This efficiency and flexibility can turn into substantial cost savings and increased revenues when dealing with a large content set.</p>
<h2 dir="ltr">Further refinement of the metadata</h2>
<p>Though including the ISBN number in YAMLX gives us a lot of options, there are some problems with this markup.<a href="http://www.thedynamicpublisher.com/wp-content/uploads/2012/03/Screen-Shot-2012-03-03-at-11.43.23-PM.png" rel="shadowbox[sbpost-1365];player=img;"><img class="aligncenter size-full wp-image-1495" title="War and Peace Editions" src="http://www.thedynamicpublisher.com/wp-content/uploads/2012/03/Screen-Shot-2012-03-03-at-11.43.23-PM.png" alt="" width="469" height="344" /></a></p>
<p>The first problem is the accuracy of the metadata. An ISBN number does not identify a literary work directly. It identifies a particular edition of a book from a particular publisher in a particular binding in a particular year. This distinction can be important. There are <a href="http://www.google.com/search?client=safari&amp;rls=en&amp;q=067003469X&amp;ie=UTF-8&amp;oe=UTF-8#q=war+and+Peace&amp;hl=en&amp;client=safari&amp;rls=en&amp;prmd=imvnsb&amp;source=lnms&amp;tbm=bks&amp;ei=GodST9eSMIjF0QWAmrzsCw&amp;sa=X&amp;oi=mode_link&amp;ct=mode&amp;cd=7&amp;ved=0CBYQ_AUoBg&amp;bav=on.2,or.r_gc.r_pw.r_cp.r_qf.,cf.osb&amp;fp=ab662359c0ba77e&amp;biw=1343&amp;bih=830">many other editions</a> of War and Peace, in <a href="http://en.wikipedia.org/wiki/War_and_Peace#English_and_other_translations">many languages</a>. War and Peace is a very long book in all those editions and all those languages. The paragraph is not referring specifically to the the Vintage Edition of 2008. It is referring to War and Peace as a novel generally. Using the ISBN actually makes the metadata more specific than the text it is marking up. That could be a problem for some of the ways we might want to query this content.</p>
<p>Suppose we wanted to make a list of all the statements about the novel War and Peace in our content. We can do this easily enough using an <a href="http://www.w3.org/TR/xpath/#section-Introduction">XPath</a> expression like this:</p>
<p>p[title="War and Peace"]</p>
<p>This says, give me all the <strong>&lt;p&gt;</strong> elements that contain a <strong>&lt;title&gt;</strong> element with the content “War and Peace&#8221;. Again, don’t worry about the syntax. My only reason for showing you this is to show that none of this is rocket science.</p>
<p><a href="http://www.thedynamicpublisher.com/wp-content/uploads/2012/03/Screen-Shot-2012-03-03-at-11.28.56-PM.png" rel="shadowbox[sbpost-1365];player=img;"><img class="alignleft size-full wp-image-1490" title="Movie: War and Peace with Audrey Hepburn and Henry Fonda" src="http://www.thedynamicpublisher.com/wp-content/uploads/2012/03/Screen-Shot-2012-03-03-at-11.28.56-PM-e1330817501986.png" alt="" width="250" height="195" /></a>There is a problem with this XPath expression, however. It will return all the paragraphs that contain a title “War and Peace”. But this might include references to the title of the <a href="http://en.wikipedia.org/wiki/War_and_Peace_(1956_film)">movie</a> of the same name, or the <a href="http://en.wikipedia.org/wiki/War_and_Peace_(Prokofiev)">opera</a>, or the non-fiction book called “War and Peace”, and that is not what we want. We just want statements about the novel.</p>
<p>To narrow it down to the novel, we look for additional metadata that can help narrow the focus. One such piece of metadata is the ISBN, so we could try this:</p>
<p>p[title/@isbn="1400079985"]</p>
<p>This says, give me all the <strong>&lt;p&gt;</strong> elements that contain a <strong>&lt;title&gt;</strong> element with an <strong>isbn</strong> attribute that has a <strong>value</strong> of &#8220;1400079985&#8243;. That will certainly eliminate any movies, operas and non-fiction books, but it could also miss some references to the novel.The problem is that there is more than one ISBN number that can refer to <em>War and Peace</em> the novel, since it exists in many different editions. There is nothing to say that every author who marks up a reference to <em>War and Peace</em> will look up the same ISBN. This is why it is a problem that our metadata is more specific than the content it describes: it can cause us to miss some instances of the content.</p>
<p>A second problem is author productivity. The author who wrote the paragraph probably doesn’t know the ISBN of any particular edition of <em>War and Peace</em> off the top of their head. If the markup called for an ISBN, the author would have to stop and look one up. Saving authors from having to stop and look things up can produce some significant productivity benefits. It can also potentially increase your pool of available authors.</p>
<p>So, using the ISBN as metadata is too precise and makes life difficult for authors. We need to come up with some markup that is at the right level of precision and is easier for authors to create. For example, we could do this:</p>
<p>&lt;p&gt;&lt;novel author=&#8221;Leo Tolstoy&#8221;&gt;War and Peace&lt;/novel&gt; is a &lt;em&gt;very&lt;/em&gt; long book.&lt;/p&gt;</p>
<p>Here, we have replaced the <strong>&lt;title&gt;</strong> tag with the more specific <strong>&lt;novel&gt;</strong> tag, and replaced to too-specific isbn attribute with the just-specific-enough <strong>author</strong> attribute (just in case another author has also written a novel called <em>War and Peace</em>).</p>
<p>This markup is obviously easier for authors to create. It only asks them for the things they already know, so they won’t have to stop while authoring to look anything up.</p>
<p><a href="http://www.thedynamicpublisher.com/wp-content/uploads/2012/03/LeoTolstoy.jpg" rel="shadowbox[sbpost-1365];player=img;"><img class="alignright size-full wp-image-1491" title="Leo Nikolayevich Tolstoy" src="http://www.thedynamicpublisher.com/wp-content/uploads/2012/03/LeoTolstoy-e1330817720776.jpg" alt="" width="200" height="221" /></a>What about selecting every paragraph that refers to the novel by Tolstoy? We can do this more accurately as well, using an XPath like this:</p>
<p>p[novel[@author="Leo Tolstoy"] = “War and Peace&#8221;]</p>
<p>This says, give me all the <strong>&lt;p&gt;</strong> elements that have a <strong>&lt;novel&gt;</strong> element that has an <strong>author</strong> attribute with the <strong>value</strong> “Leo Tolstoy” and whose content is “War and Peace”. This is what we wanted, so it looks like we have got our metadata correct now.</p>
<p>Or have we? With the ISBN metadata, we are able to pull in publications information by using the ISBN number to query the ISBN database. Without an ISBN number, how can we get that data? We still can get that data, but we have to use a different query to extract it. Our original code did the lookup like this:</p>
<p>&lt;xsl:variable name=&#8221;book-info&#8221; select=&#8221;document(concat(&#8216;http://example.com/isbn/lookup?&#8217;, $isbn))&#8221;/&gt;</p>
<p>Now we need to change it to do the lookup based on the metadata we have: category (novel), title and author:</p>
<p>&lt;xsl:template match=&#8221;p/novel&#8221;&gt;<br />
&lt;!&#8211; capture the metadata to look up &#8211;&gt;<br />
&lt;xsl:variable name=&#8221;title&#8221; select=&#8221;.&#8221;/&gt;<br />
&lt;xsl:variable name=&#8221;author&#8221; select=&#8221;@author&#8221;/&gt;</p>
<p>&lt;!&#8211; call the web service to get book info &#8211;&gt;</p>
<p dir="ltr">&lt;xsl:variable name=&#8221;book-info&#8221; select=&#8221;document(concat(&#8216;http://example.com/isbn/lookup?category=novel&amp;title=&#8217;, $title, &#8216;&amp;author=&#8217;, $author))&#8221;/&gt;</p>
<p>The only thing different about the results we will get from this query is that there may be more than one book with that ISBN (actually, there will certainly be, since there are many editions of <em>War and Peace</em> in print). So the code that adds the book info to the content will need to pick one of the alternatives based on some relevant pieces of publication data (such as the most recent publication date). A side benefit of this is that the publication information we show will be consistent wherever we refer to <em>War and Peace</em> in our content.</p>
<p><a href="http://www.thedynamicpublisher.com/wp-content/uploads/2012/03/Screen-Shot-2012-03-03-at-11.37.37-PM.png" rel="shadowbox[sbpost-1365];player=img;"><img class="alignleft size-full wp-image-1493" title="War and Peace Volumes" src="http://www.thedynamicpublisher.com/wp-content/uploads/2012/03/Screen-Shot-2012-03-03-at-11.37.37-PM-e1330817935924.png" alt="" width="200" height="183" /></a>So should we always mark up “War and Peace” using the <strong>&lt;novel&gt;</strong> tag and an author attribute?  Not necessarily. What we are really seeing here is that when we mention “War and Peace” in our content, we could actually be referring to different things. We could, as in the case we are looking at, be talking about the novel generally. But in another circumstance, we might be referring to a specific published edition of the novel. Even though the string is “War and Peace” in both cases, that text is referring to different things. One of the most important roles of XML-based metadata is to make these kinds of distinction clear so that we can process each case appropriately.</p>
<p>What we actually need in YAMLX are two different tags so that we can apply the right metadata to the words “War and Peace” depending on what they mean in each case.  For references to the novel could write:</p>
<p>&lt;p&gt;&lt;novel author=&#8221;Leo Tolstoy&#8221;&gt;War and Peace&lt;/novel&gt; is a &lt;em&gt;very&lt;/em&gt; long book.&lt;/p&gt;</p>
<p>For references to a particular edition of the novel, we could extend YAMLX to include an edition element which takes an isbn attribute:</p>
<p>&lt;p&gt;&lt;edition isbn=&#8221;1400079985&#8243;&gt;War and Peace&lt;/title&gt; is &lt;em&gt;still&lt;/em&gt; on back order.&lt;/p&gt;</p>
<p>Now our processing application can recognize that the words mean something different in each case, and can process them accordingly.</p>
<h2 dir="ltr">Own the data format, own the functionality</h2>
<p>So what is the right tag to pick to mark up the string “War and Peace” &#8212; <strong>&lt;i&gt;</strong>, <strong>&lt;em&gt;</strong>, <strong>&lt;title&gt;</strong>, <strong>&lt;novel&gt;</strong>, or <strong>&lt;book&gt;</strong>? There is no single right answer. It all depends on what you want to do with your data. Content is data. Content becomes data as soon as you enter it into a computer system. Whether it is a <a href="http://office.microsoft.com/en-us/word/">Microsoft Word</a> document, an <a href="http://www.adobe.com/products/framemaker.html">Adobe FrameMaker</a> file, or XML, content is data. The difference is that with XML you own the structure of your own content data.</p>
<p><a href="http://www.thedynamicpublisher.com/wp-content/uploads/2012/03/datastructure.jpg" rel="shadowbox[sbpost-1365];player=img;"><img class="alignright size-full wp-image-1497" title="Data structure" src="http://www.thedynamicpublisher.com/wp-content/uploads/2012/03/datastructure-e1330818568906.jpg" alt="" width="200" height="122" /></a>Owning the data structure of your content is important, because owning the data structure puts you in control of the content creation and publishing functionality. Using a binary format like Word or FrameMaker means you get the functionality of Word or FrameMaker. With some scripting, you may be able to add some additional functionality, but only insofar as the Word or FrameMaker data model supports it. With XML, you decide on the file format, and you decide what functionality you will implement to process your data. You can define the data format that best meets your particular business needs.</p>
<p>None of this means much, however, unless you do something with your data that you couldn&#8217;t have done with Word or FrameMaker &#8212; something like automatically pulling in additional content from a database in order to enhance your content and save your authors work.</p>
<p>The real take-away here is that XML makes your content into a database. Because XML allows you to apply metadata to your content at any level of granularity, it allows you to query your content at any level of granularity, and that lets you process your content at any level of granularity. At the same time, it allows you to embed metadata in your content that you can use to create a database query that pulls in other data (as we saw with the ISBN example).</p>
<p><a href="http://www.thedynamicpublisher.com/wp-content/uploads/2012/03/databasepublishing.jpg" rel="shadowbox[sbpost-1365];player=img;"><img class="alignleft size-full wp-image-1499" title="Documents as databases" src="http://www.thedynamicpublisher.com/wp-content/uploads/2012/03/databasepublishing-e1330818728380.jpg" alt="" width="250" height="204" /></a>This is what every XML tagging language does &#8212; it turns content into a database. The differences between different XML languages lie in how detailed and specific the structure of that database is, and how well it aligns with a particular set of business needs. This has a really important consequence whether you are shopping for an authoring system or building your own: The data format is more important than the application functionality.</p>
<p>When you buy a conventional off-the-shelf tool like Word or FrameMaker, your buying decision is based largely on the application’s functionality. You are not looking at what it might do in the future, but on what it actually does today. How the file format that the application uses supports its functionality is not something you generally worry about. You are buying functionality, not data structure.</p>
<p>But with XML, the primary consideration should not be what functionality you get out of the box, but what functionality the data structure of the XML supports. Even if you use a tool or a tool kit with existing functionality, you are not confined to that functionality, and it should not be the primary thing you base your decision on. You can always add any functionality you need, if your data structure supports it. But you can’t build or buy functionality that the data does not support.</p>
<p>This is what you need to know, therefore, when you consider an XML solution:</p>
<ul>
<li>XML turns your content into a database. You can query that database to organize and structure your content and to combine content from different sources.</li>
<li>The design of that database determines what you can do with the data. You need to pay careful attention to your markup design to make sure it allows you to implement the functionality you need to streamline your content development process.</li>
<li>You can improve author productivity, and bring more authors into the fold, by designing the markup to ask for information they already know, and then use that information to pull in related data.</li>
<li>The key decision you have to make is not what functionality a particular tool supports out of the box, but what functionality the format of your content will support.</li>
</ul>
</div>
<div></div>
<h3>About the author</h3>
<div>Mark Baker, President of Analecta Communications Inc., has over 20 years of experience in the technical communications industry and over 15 years designing, implementing, and using structured authoring systems. He blogs at <a href="http://everypageispageone.com/">everypageispageone.com</a>.</div>
<div class="zemanta-pixie" style="margin-top: 10px; height: 15px;"><img class="zemanta-pixie-img" style="border: none; float: right;" src="http://img.zemanta.com/pixy.gif?x-id=c146eddd-1e1b-47f6-90ac-ce925311d496" alt="" /></div>
]]></content:encoded>
			<wfw:commentRss>http://www.thedynamicpublisher.com/2012/03/06/what-is-xml-really-about/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Publishing Dynamic Product Catalogs</title>
		<link>http://www.thedynamicpublisher.com/2012/02/01/publishing-dynamic-product-catalogs/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=publishing-dynamic-product-catalogs</link>
		<comments>http://www.thedynamicpublisher.com/2012/02/01/publishing-dynamic-product-catalogs/#comments</comments>
		<pubDate>Wed, 01 Feb 2012 22:27:57 +0000</pubDate>
		<dc:creator>JoAnn Hackos</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[Adobe InDesign]]></category>
		<category><![CDATA[Automated Publishing]]></category>
		<category><![CDATA[component content]]></category>
		<category><![CDATA[Darwin Information Typing Architecture]]></category>
		<category><![CDATA[marketing]]></category>
		<category><![CDATA[product information management]]></category>
		<category><![CDATA[sales]]></category>
		<category><![CDATA[XML]]></category>

		<guid isPermaLink="false">http://www.thedynamicpublisher.com/?p=1363</guid>
		<description><![CDATA[As a technical information-development professional looking for opportunities to expand dynamic publishing to the enterprise, don’t overlook product sales catalogs. Product sales catalogs provide a great opportunity to demonstrate the value of single-sourcing content, integrating with purchasing databases, and automating the publishing process. Enabling sales personnel to assemble catalog copy to meet their immediate needs [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.thedynamicpublisher.com/wp-content/uploads/2012/02/product-catalog.jpg" rel="shadowbox[sbpost-1363];player=img;"><img class="alignleft size-full wp-image-1373" title="Product catalog" src="http://www.thedynamicpublisher.com/wp-content/uploads/2012/02/product-catalog-e1328134737807.jpg" alt="© SergeyIT - Fotolia.com" width="250" height="167" /></a>As a technical information-development professional looking for opportunities to expand <a href="http://www.thedynamicpublisher.com/2012/01/09/what-is-dynamic-publishing-anyway/">dynamic publishing</a> to the enterprise, don’t overlook product sales catalogs. Product sales catalogs provide a great opportunity to demonstrate the value of <a href="http://en.wikipedia.org/wiki/Single_source_publishing">single-sourcing content</a>, integrating with purchasing databases, and automating the publishing process. Enabling sales personnel to assemble catalog copy to meet their immediate needs further exploits the potential of <a href="http://www.econtentmag.com/Articles/ArticleReader.aspx?ArticleID=19433">component content management</a> and brings dynamic publishing to an otherwise skeptical group.</p>
<p>The place to begin is with the pain points.</p>
<p><strong>Introducing database publishing</strong></p>
<p>In one organization, we found a tiny group putting together product catalogs and price books for their dealers with a slow, expensive, and painstaking process. Basic product descriptions and benefits lists were combined with photographs of each product and tables of data copied manually from the price data in a corporate financial database. After the copy was assembled and approved, a graphic designer took six months using <a href="http://www.adobe.com/products/indesign.html">Adobe InDesign</a> to carefully lay out each page. By the time the catalog was completed, the pricing data was already out of date. Each catalog was a chaotic mix of page layouts with stars, bars, explosions, and other “decorative” items strewn through hundreds of unique pages. The design chaos made it difficult to update the content with last minute changes. And, the customers had difficulty finding the precise products and prices they needed.</p>
<p style="text-align: center;"><a href="http://www.thedynamicpublisher.com/wp-content/uploads/2012/01/JTH_Figure11.jpg" rel="shadowbox[sbpost-1363];player=img;"><img class="aligncenter size-full wp-image-1369" title="Content Silos" src="http://www.thedynamicpublisher.com/wp-content/uploads/2012/01/JTH_Figure11.jpg" alt="" width="484" height="364" /></a>Figure 1: Existing catalog development process</p>
<p><strong>Dynamic development process</strong></p>
<p>So, how you overcome old school product catalog challenges? First, identify the common elements in the catalog copy and assist the catalog team in developing a standard. Overcoming allegiance to layout chaos means stressing the potential for cost cutting. The promise of faster turnaround and lower development costs can persuade even the most recalcitrant sales team of the benefits of standardized copy.</p>
<p>Next comes the design of a simplified authoring environment. The standard copy consists of a product name, a short description, and a benefits list. Each product is illustrated with one or more photographs. The result is a thoroughly simple design that could easily be developed by a single author.</p>
<p>The base for authoring becomes an <a href="http://www.ditaworld.com/">XML/DITA editing tool</a> that simplifies entering content. Every product description follows the exact same structure, a feature that increases information access and readability for the customers.</p>
<p>[Editors note: DITA refers to the <a href="http://www.ibm.com/developerworks/xml/library/x-dita1/index.html">Darwin Information Typing Architecture</a>, an XML-based architecture for authoring, producing, and delivering technical information. XML stands for <a href="http://en.wikipedia.org/wiki/XML">Extensible Markup Language</a>, a set of rules for encoding documents in a format that is both human- and machine-readable.]</p>
<p>The most complex part of the new process is the design of a process to produce the pricing tables by drawing the data directly from the financial database. Once the required transforms are in place, each table is built automatically and appended to the content in the <a href="http://docs.oasis-open.org/dita/v1.0/archspec/topicover.html">DITA topic</a>.</p>
<p>For this project, we created transforms to develop custom indexes that make the information easier for dealers to find. Once the <a href="http://www.w3schools.com/xslfo/xslfo_intro.asp">XSL-FO</a> stylesheet was in place, a design and publishing process that once took months to complete was completed in minutes and eliminated the graphic-design costs. Once the copy is reviewed and approved, the printing takes about two weeks. However, this company is now moving to on-demand printing triggered by a specific dealer request.</p>
<p style="text-align: center;"><a href="http://www.thedynamicpublisher.com/wp-content/uploads/2012/02/JTH_Figure2.jpg" rel="shadowbox[sbpost-1363];player=img;"><img class="aligncenter size-full wp-image-1370" title="Optimized Publishing Pipeline" src="http://www.thedynamicpublisher.com/wp-content/uploads/2012/02/JTH_Figure2.jpg" alt="" width="484" height="364" /></a></p>
<p style="text-align: center;">Figure 2: New Catalog Publishing Process</p>
<p>The marketing and sales managers, who had been skeptical at first, saw the potential for additional catalogs. The second project was a non-US catalog with different prices, dealer discounts, and a more limited product set. <a href="http://en.wikipedia.org/wiki/Metadata">Metadata</a> allowed for easy selection of products for more limited markets, replacing the one-size-fits-all approach of the previous design process. The automated publishing process also enables the company to produce catalogs much more frequently than they have in the past.</p>
<p>The company plans to link the DITA topics and the publishing process to their e-commerce site and automated online purchasing process.</p>
<p><strong>Taking automated publishing to the next level</strong></p>
<p>But automating the publishing process and integrating XML structured text with data from a relational database is only the first step in catalog publishing. In another organization, we identified the potential for customized catalogs produced on-demand for a sales representative.</p>
<p>Once again, the basic information about a product is authored in DITA topics and stored in a <a href="http://en.wikipedia.org/wiki/Component_content_management_system">component content management system</a> along with the drawings and photographs required for the catalog.</p>
<p>The analysis of the catalog in this project included not only descriptive information but tables and graphs of data. The original presentation of the information was as chaotic as the first project. Every product section in the print catalog was unique and every page was individually designed. In some cases, product benefits lists and critical engineering data were truncated in order to fit the page layout.</p>
<p>Once again, the first step was standardization. Working with the marketing communications management, we identified a standard set of content for each product and designed a consistent layout to be produced through the <a href="http://dita-ot.sourceforge.net/">DITA Open Toolkit</a> and an XSL-FO stylesheet.</p>
<p>But the key question was how to get the sales representatives to use the source information to produce custom catalogs. We knew that sales representatives wanted to create unique catalogs for individual customers, rather than carrying around the huge catalog of the entire product line. They also wanted nothing to do with DITA topics and XML.</p>
<p>In response, we created a <a href="http://sharepoint.microsoft.com/en-us/Pages/default.aspx">SharePoint</a> interface that allows a sales person to select from a variety of options, including product name, industry type, and other key characteristics of the information. By simply clicking check boxes, each sales person creates a unique catalog that is fully formatted and immediately available for download and printing.</p>
<p>In a similar project some years earlier, we enabled the sales representative to use a simple, clickable interface on a website to select specific functionality for a product set and choose a cover page with the customer’s business name. Because the sales force was widely distributed, we even gave them the ability to select a printing facility nearby. All that was required of them was to drive to the printer and pick up the final catalogs for their next sales call.</p>
<p><strong>Looking for opportunities</strong></p>
<p>In most organizations, development of product catalogs and price books are cost-intensive, time-consuming, and tedious. They represent the ultimate static publishing. But the arguments for dynamic publishing of catalog content are easily persuasive to those watching out for the bottom line:</p>
<ul>
<li>Significantly reduce the time-to-market for catalog content from months to days</li>
<li>Significantly reduce the cost of developing final catalog layout to the initial development of XSL-FO stylesheets</li>
<li>Enable the development of custom catalogs from the same source content</li>
<li>Enable on-demand printing of custom catalogs</li>
<li>Ensure that pricing data is accurate and up-to-date, coming directly from purchasing databases</li>
<li>Enable on-demand development of custom catalogs by individual sales representatives</li>
<li>Link catalog content to e-commerce websites</li>
</ul>
<p>No doubt there are many other opportunities for dynamic publishing in your organization. Your starting point is to develop a single source of all materials (text, images, and pricing data), free of formatting (XML/DITA), and presented to staff and customers through an intuitive and easily available user interface.</p>
<div class="zemanta-pixie" style="margin-top: 10px; height: 15px;"><img class="zemanta-pixie-img" style="border: none; float: right;" src="http://img.zemanta.com/pixy.gif?x-id=e7619126-4cd2-4e51-b865-255b0d2e8821" alt="" /></div>
]]></content:encoded>
			<wfw:commentRss>http://www.thedynamicpublisher.com/2012/02/01/publishing-dynamic-product-catalogs/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Understanding Content Conversion: Unfortunately, There’s No ‘Easy’ Button</title>
		<link>http://www.thedynamicpublisher.com/2012/01/17/understanding-content-conversion-unfortunately-there%e2%80%99s-no-%e2%80%98easy%e2%80%99-button/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=understanding-content-conversion-unfortunately-there%25e2%2580%2599s-no-%25e2%2580%2598easy%25e2%2580%2599-button</link>
		<comments>http://www.thedynamicpublisher.com/2012/01/17/understanding-content-conversion-unfortunately-there%e2%80%99s-no-%e2%80%98easy%e2%80%99-button/#comments</comments>
		<pubDate>Tue, 17 Jan 2012 18:25:50 +0000</pubDate>
		<dc:creator>Mark Gross</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[content quality]]></category>
		<category><![CDATA[conversion]]></category>
		<category><![CDATA[legacy content]]></category>
		<category><![CDATA[quality assurance]]></category>
		<category><![CDATA[XML]]></category>

		<guid isPermaLink="false">http://www.thedynamicpublisher.com/?p=1318</guid>
		<description><![CDATA[By Mark Gross, President, CEO, Data Conversion Laboratory Data Conversion Laboratory, the company I founded, has been doing document conversion for thirty years and every once in a while I still get asked from someone I haven’t seen in a while “are you still doing that?” or “isn’t there software that does all that?” The [...]]]></description>
			<content:encoded><![CDATA[<p>By Mark Gross, President, CEO, <a href="http://www.dclab.com/">Data Conversion Laboratory</a></p>
<div id="attachment_1323" class="wp-caption alignleft" style="width: 160px"><a href="http://www.thedynamicpublisher.com/wp-content/uploads/2012/01/MarkGross.png" rel="shadowbox[sbpost-1318];player=img;"><img class="size-full wp-image-1323" title="MarkGross" src="http://www.thedynamicpublisher.com/wp-content/uploads/2012/01/MarkGross.png" alt="" width="150" height="180" /></a><p class="wp-caption-text">Mark Gross, President/CEO Data Conversion Laboratory</p></div>
<p>Data Conversion Laboratory, the company I founded, has been doing document conversion for thirty years and every once in a while I still get asked from someone I haven’t seen in a while “are you still doing that?” or “isn’t there software that does all that?”</p>
<p>The truth is that if it was easy, it would indeed be all automated, which is already the case for news feeds, financial transactions, and other standardized data flows. But when it comes to documents and books, creativity will not be bound by rules and style sheets, especially at deadline when one wants a certain look, and MS-word chooses not to cooperate. The truth is that a document can contain anything, and computer software doesn’t work well with ‘randomness’.</p>
<p>Even what to the human eye looks like a simple book – a book with simple text, no tables, and no links – still contains complications that will thwart software not meant to deal with it. In a recent test of three free software packages, not one book came out perfectly. Each and every one of them had problems. To complicate matters, each book had <a href="http://www.dclab.com/blog/2011/05/webinar-automated-conversion-to-ebook-redux/">different problems</a> (recorded webinar includes examples).</p>
<p><strong>Let’s get flexible</strong></p>
<p>In order to deliver the high quality, customized information that consumers expect – and in some cases, demand – we’re going to have to start thinking seriously about creating automated streams of standardized content – something most organizations don’t have today.</p>
<p><a href="http://www.thedynamicpublisher.com/wp-content/uploads/2012/01/XML.jpg" rel="shadowbox[sbpost-1318];player=img;"><img class="alignright size-full wp-image-1329" title="XML" src="http://www.thedynamicpublisher.com/wp-content/uploads/2012/01/XML-e1326825411608.jpg" alt="© shockfactor - Fotolia.com" width="250" height="250" /></a>For organizations with large collections of information to distribute and monetize their content, the clearest solution seems to be adopting a flexible, standardized content model and maintaining that content in an appropriate type of content management system. In our mobile-connected, always-on world, this means creating and delivering content able to customers on the device of their choosing. It also means future-proofing that content so it will be ready to be quickly and efficiently prepared and delivered to the many devices that have yet to hit the market.</p>
<p>There are many content standards to choose from: <a href="http://www.w3schools.com/html/">HTML</a>, <a href="http://idpf.org/epub/30">EPUB</a>, <a href="http://www.w3schools.com/xml/">XML</a>, <a href="http://www.adobe.com/pdf/">PDF</a>, and <a href="http://dita.xml.org/node/3170">DITA</a>. But which is one is right for your purposes? For your audience? For the audience of the future? Will you need to support more than one?</p>
<p>For now, let’s assume that some robust form of XML will be the right thing to store your information – the specific form best for your content will need some more discussion. However, it seems that for most large collections, moving to one of the simpler formats like HTML or EPUB can be a risky investment due to the lack of flexibility they offer.</p>
<p>EPUB, for example, now on <a href="http://idpf.org/epub/30/spec/epub30-overview.html">version 3.0</a>, is specialized to the needs of electronic books as they currently exist. It’s very possible that all the features of your content are not easily definable within the EPUB standard, and are not displayable on current devices. If you limit yourself to converting to the current version of EPUB you may be limiting your content as new capabilities, not currently envisioned, are introduced. The same is true of HTML, which is designed for display of information.</p>
<p>To preserve your investment in converting to a standardized format the safer approach is to convert and store the content in a more robust version of XML, such as DITA, <a href="http://www.docbook.org/whatis">DocBook</a>, <a href="http://dtd.nlm.nih.gov/">NLM</a>, <a href="http://www.tei-c.org/index.xml">TEI</a>, <a href="http://www.s1000d.net/">S1000D</a>, and various other XML standards created for specific purposes. If properly designed, you would then be able to automatically convert your content to EPUB, HTML, PDF, and other final formats in the future.</p>
<p><strong>OK, why is it so difficult to convert?</strong></p>
<p>If everyone wrote their documents with the intent that they be standardized and converted, life would be easy (and we wouldn’t have that much to do). But the reality is most content is not easily extractable, and lacks the details needed for a full conversion. Much needs to be corrected, and much needs to be inferred based on the content.</p>
<p><a href="http://www.thedynamicpublisher.com/wp-content/uploads/2012/01/PDF-icon.jpg" rel="shadowbox[sbpost-1318];player=img;"><img src="http://www.thedynamicpublisher.com/wp-content/uploads/2012/01/PDF-icon-e1326826059190.jpg" alt="" title="PDF" width="160" height="160" class="alignleft size-full wp-image-1335" /></a>As an example, let’s look at the difficulties in extracting content from PDF files. Since PDF is a print format, PDF documents are typically less-structured versions of their word-processor originals. While PDF content is laid out to look good, it includes very little structure—that is, it contains few clues as to the function of text elements (e.g., paragraphs, spaces, line breaks) or how they ought to be displayed in a different context (for instance, an e-book). While converting thoroughly structured content to XML is straightforward, PDF doesn’t contain explicit structuring. But an even more basic problem has to do with properly extracting the content from the PDF to begin with.</p>
<p>Examples of problems you are likely to encounter with commercial packages include the following:</p>
<p><strong>Incorrect Word Spaces</strong></p>
<p>While spacing is usually extracted correctly, since PDF documents create spaces visually (i.e., they are not really labeled as “one standard space” or “two standard spaces”), spacing between words is sometimes misinterpreted by conversion software, causing spaces to be added or deleted incorrectly during PDF-to-Word extraction. That’s why <a href="http://rosscarter.com/2011/385.html">ebooks that have not been fully reviewed</a> will have words coming together, or otherwise incorrectly spaced.</p>
<p><strong>Paragraph Delineation</strong></p>
<p>In most cases, PDF documents contain no explicit information to indicate where a paragraph begins or ends, so this too must be guessed at by conversion software, based on “visual” interpretation of the appearance of chunks of text. While conversion software frequently does guess correctly, paragraph delineation can be a source of extraction errors, particularly when paragraphs are very short or span pages, or images and table get in the way.</p>
<p><a href="http://www.thedynamicpublisher.com/wp-content/uploads/2012/01/City_and_the_City_E-text_soft_hyphen_as_hyphen-space.png" rel="shadowbox[sbpost-1318];player=img;"><img src="http://www.thedynamicpublisher.com/wp-content/uploads/2012/01/City_and_the_City_E-text_soft_hyphen_as_hyphen-space-e1326827297404.png" alt="" title="City_and_the_City_E-text_soft_hyphen_as_hyphen-space" width="250" height="375" class="alignright size-full wp-image-1339" /></a><strong>Hyphens</strong></p>
<p>Hyphens pose a problem because they serve various purposes among which an automated system cannot distinguish. While the hyphen joining a term such as “half-life” should appear no matter where the words are placed within a document, a hyphen that appears halfway through a word because of a line break (e.g., hyphen-ated) becomes an ugly error once the word is moved to the middle of a line. This is also something you’ll see often in ebooks you download.</p>
<p><strong>Emphasis</strong></p>
<p>Depending on how a document is rendered in PDF, extracting the correct emphasis from a PDF document can sometimes pose problems for conversion software. Again, this is because PDF structure is nothing more than a visual representation; while text may appear emphasized, the PDF does not tag it as “emphasized”—conversion software must make its best guess based on what it can glean from the text’s appearance.</p>
<p><strong>Superscripting and Subscripting</strong></p>
<p>Since PDF documents’ treatment of super and subscripts is limited to the way they appear when laid out in the PDF (rather than by some kind of “superscript” or “subscript” tag), extraction software tends to run into problems with determining the vertical alignment of text. As a result, super and subscripts are frequently misinterpreted by extraction software.</p>
<p><strong>Special Characters</strong></p>
<p><a href="http://www.thedynamicpublisher.com/wp-content/uploads/2012/01/City_and_the_City_E-text_Beszel.png" rel="shadowbox[sbpost-1318];player=img;"><img src="http://www.thedynamicpublisher.com/wp-content/uploads/2012/01/City_and_the_City_E-text_Beszel-e1326827379155.png" alt="" title="City_and_the_City_E-text_Beszel" width="250" height="375" class="alignright size-full wp-image-1341" /></a>In PDF documents, special characters like foreign or mathematical symbols are frequently represented by unusual or proprietary fonts. In order to extract them to a word processor, these characters first need to be converted to a more standard character representation (e.g., ISO or Unicode). While many conversion software suites build conversion tables to handle such characters, it is impossible to keep up with the vast variety of atypical and proprietary fonts in use, and so many special characters fail to extract properly.</p>
<p><strong>Sub-fonting</strong></p>
<p>PDF’s approach to font embedding is another obstacle to proper extraction. Sometimes when PDFs are created, the PDF document does not store the information for the entire font, but rather stores only the parts of the font, which are used in a given document. The characters within this “sub-font” are accessed via an indirect table within the PDF document itself, making correct interpretation and extraction of sub-fonted characters difficult. Many conversion tools cannot extract these characters at all, and produce “garbage” text instead of accurately extracted content.</p>
<p><strong>Tables</strong></p>
<p>Tables are among the trickiest document elements to extract. This is because the appearance of even a simple table is determined by numerous attributes, including but not limited to column and row delineation, header and body delineation, vertical and horizontal cell spanning, cell separators, and vertical and horizontal cell alignment. With none of this information included in the source PDF, it is nearly impossible for an automated tool to reproduce a table exactly as it appeared in the original document.</p>
<p>While some short or simple documents may be able to undergo a PDF-to-Word (and subsequent PDF-to-EPUB) conversion with minimal difficulty, any long or complex document set will encounter several of these obstacles. The obstacles inherent in any PDF text extraction should underscore, first, the utility of retaining original versions of source documents in word processor format, if possible; and second, the critical importance of a good quality assurance strategy in any conversion process.</p>
<p><strong>So what do I do now?</strong></p>
<p>Obviously many millions of pages get converted; we convert millions of pages ourselves. There are solutions to all of the above and approaches to dealing with all the above, and more, which will be discussed in future columns.</p>
<p><strong>Questions about conversion?</strong></p>
<p>If you have specific questions about conversion that you’d like me to answer, use the comment feature of this website to submit them. I’ll do my best to find an answer for you.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.thedynamicpublisher.com/2012/01/17/understanding-content-conversion-unfortunately-there%e2%80%99s-no-%e2%80%98easy%e2%80%99-button/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>What is Dynamic Publishing, Anyway?</title>
		<link>http://www.thedynamicpublisher.com/2012/01/09/what-is-dynamic-publishing-anyway/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=what-is-dynamic-publishing-anyway</link>
		<comments>http://www.thedynamicpublisher.com/2012/01/09/what-is-dynamic-publishing-anyway/#comments</comments>
		<pubDate>Mon, 09 Jan 2012 22:35:47 +0000</pubDate>
		<dc:creator>Scott Abel</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[definition]]></category>
		<category><![CDATA[terminology]]></category>
		<category><![CDATA[vocabulary]]></category>

		<guid isPermaLink="false">http://www.thedynamicpublisher.com/?p=1303</guid>
		<description><![CDATA[It would seem a straightforward assignment: defining the term &#8212; dynamic publishing &#8212; that is the primary focus of this online publication. But, as I quickly discovered in my quest to help establish a common vocabulary, my task was far more complicated than I originally realized. As it turns out, there are subtle variations in [...]]]></description>
			<content:encoded><![CDATA[<p><div id="attachment_1275" class="wp-caption alignright" style="width: 173px"><a href="http://www.thedynamicpublisher.com/wp-content/uploads/2011/12/scottabelheadshot.png" rel="shadowbox[sbpost-1303];player=img;"><img src="http://www.thedynamicpublisher.com/wp-content/uploads/2011/12/scottabelheadshot.png" alt="" title="scottabelheadshot" width="163" height="176" class="size-full wp-image-1275" /></a><p class="wp-caption-text">Scott Abel, Editor, TheDynamicPublisher.com</p></div>It would seem a straightforward assignment: defining the term &#8212; dynamic publishing &#8212; that is the primary focus of this online publication. But, as I quickly discovered in my quest to help establish a common vocabulary, my task was far more complicated than I originally realized. </p>
<p>As it turns out, there are subtle variations in the definitions industry leaders and veteran consultants use to explain what they mean when they talk about dynamic publishing. In fact, some of the most widely cited experts don&#8217;t offer up a clear-cut definition for the term, opting instead to define related terms like &#8220;dynamic content&#8221; or &#8220;dynamic delivery&#8221;.</p>
<p>According to <a href="http://www.thedynamicpublisher.com/author/ann/">Ann Rockley</a>, author of <a href="http://www.amazon.com/gp/product/032181536X/ref=pd_lpo_k2_dp_sr_1?pf_rd_p=486539851&#038;pf_rd_s=lpo-top-stripe-1&#038;pf_rd_t=201&#038;pf_rd_i=0735713065&#038;pf_rd_m=ATVPDKIKX0DER&#038;pf_rd_r=09Y485X6KC1B1KAW750M">“Managing Enterprise Content: A Unified Content Strategy”</a> (Second Edition)</a> [New Riders, 2012], what differentiates dynamic content from its static cousin is that “dynamic content does not exist in or as a document; it is information that is assembled only when it is requested. It exists as a series of information objects that are assembled in response to the user’s requests or [other] requirements.”</p>
<p><div id="attachment_1308" class="wp-caption alignleft" style="width: 160px"><a href="http://www.thedynamicpublisher.com/wp-content/uploads/2012/01/MEC-Cover.jpg" rel="shadowbox[sbpost-1303];player=img;"><img src="http://www.thedynamicpublisher.com/wp-content/uploads/2012/01/MEC-Cover-e1326148185557.jpg" alt="" title="MEC Cover" width="150" height="192" class="size-full wp-image-1308" /></a><p class="wp-caption-text"> </p></div>“Dynamic content,&#8221; Rockley writes, &#8220;is content that is automatically assembled to meet users’ specific needs, providing them with exactly what they are looking for, when they are looking for it, and in the format they are looking for it in&#8221;. </p>
<p>Rockley&#8217;s explanation of dynamic content is straightforward, but isn&#8217;t sufficient to answer the question, what is dynamic publishing?</p>
<p>JoAnn Hackos, in her 2002 work, <a href="http://www.comtech-serv.com/content_book.shtml">“Content Management Strategies for Dynamic Web Delivery”</a> (Wiley) discusses some of the benefits of &#8220;presenting content dynamically&#8221;, including the &#8220;great potential to make web-based content-rich resources more valuable to users.&#8221;</p>
<p>&#8220;Users appear eager to work with resources that are ‘customized’ to their needs and respond to their queries effectively,&#8221; Hackos writes. They also appear, &#8220;to prefer personalizing information resources that they use frequently.”</p>
<p><div id="attachment_1311" class="wp-caption alignright" style="width: 160px"><a href="http://www.thedynamicpublisher.com/wp-content/uploads/2012/01/Hackos.jpg" rel="shadowbox[sbpost-1303];player=img;"><img src="http://www.thedynamicpublisher.com/wp-content/uploads/2012/01/Hackos-e1326148334254.jpg" alt="" title="Hackos" width="150" height="184" class="size-full wp-image-1311" /></a><p class="wp-caption-text"> </p></div>Hackos makes clear some of the benefits of providing content dynamically, especially on the web, while also introducing an additional term in need of defining: personalization.</p>
<p>The founding sponsors of <a href="http://TheDynamicPublisher.com">TheDynamicPublisher.com</a>, <a href="http://www.quark.com">Quark Software</a>, say they believe dynamic publishing is “based on two fundamental principles: Using structured, reusable XML content and automating the delivery of this content to any media type.”</p>
<p>Quark emphasizes automation of all processes involved in an end-to-end dynamic publishing solution, including, the automation of:</p>
<ul>
<li>Content reuse</li>
<li>Layout</li>
<li>Workflow</li>
<li>Formatting and multi-channel publishing</li>
<li>Custom and personalized content</li>
</ul>
<p>And while all of this information is interesting and thought-provoking, I am still left without a clear and unambiguous definition for dynamic publishing. Shouldn&#8217;t there be a <a href="http://en.wikipedia.org/wiki/Dynamic_publishing">Wikipedia page dedicated to this topic</a> by now? As it turns out, there is. But, unfortunately, while the popular online user-generated encyclopedia is often useful at solving semantic challenges, in this instance it isn&#8217;t much help.</p>
<p><strong>So What&#8217;s An Editor To Do?</strong></p>
<p>After struggling with this challenge for several days, I realized that perhaps the best way to come up with a solid definition was to ask the community for help. That&#8217;s where you come in. </p>
<p>What is your definition of dynamic publishing? Please use the commenting feature of this blog to share your views on the subject. Next week I&#8217;ll summarize our findings and attempt to craft a definition that encompasses much of our thinking. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.thedynamicpublisher.com/2012/01/09/what-is-dynamic-publishing-anyway/feed/</wfw:commentRss>
		<slash:comments>18</slash:comments>
		</item>
		<item>
		<title>Introducing TheDynamicPublisher.com Version 2.0</title>
		<link>http://www.thedynamicpublisher.com/2011/12/30/introducing-thedynamicpublisher-com-version-2-0/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=introducing-thedynamicpublisher-com-version-2-0</link>
		<comments>http://www.thedynamicpublisher.com/2011/12/30/introducing-thedynamicpublisher-com-version-2-0/#comments</comments>
		<pubDate>Fri, 30 Dec 2011 22:56:11 +0000</pubDate>
		<dc:creator>Scott Abel</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[dynamic publishing]]></category>

		<guid isPermaLink="false">http://www.thedynamicpublisher.com/?p=1248</guid>
		<description><![CDATA[by Scott Abel, The Content Wrangler It’s hard to deliver the right information to the right people at the right time in the right format and language, increasingly, on a menagerie of mobile devices that seem to pop up faster than daisies after a good rain &#8212; especially if you rely on traditional publishing techniques. [...]]]></description>
			<content:encoded><![CDATA[<p>by Scott Abel, <a href="http://www.thecontentwrangler.com">The Content Wrangler</a></p>
<div class="mceTemp" style="text-align: left;">
<div id="attachment_1275" class="wp-caption alignright" style="width: 173px"><a href="http://www.thedynamicpublisher.com/wp-content/uploads/2011/12/scottabelheadshot.png" rel="shadowbox[sbpost-1248];player=img;"><img class="size-full wp-image-1275" title="scottabelheadshot" src="http://www.thedynamicpublisher.com/wp-content/uploads/2011/12/scottabelheadshot.png" alt="" width="163" height="176" /></a><p class="wp-caption-text">Scott Abel, Editor, TheDynamicPublisher.com</p></div>
<p>It’s hard to deliver the right information to the right people at the right time in the right format and language, increasingly, on a menagerie of mobile devices that seem to pop up faster than daisies after a good rain &#8212; especially if you rely on traditional publishing techniques. If your organization is like most others, no matter how hard you try, or how much money you throw at the problem, your reliance on print-centric processes and outdated, labor-intensive workflows fail you. But, you’re not alone. There are people who understand your problem and are willing to share what they’ve learned with you in order to help you begin to think outside the traditional publishing box.</p>
</div>
<p>Enter <a href="”http://www.">TheDynamicPublisher.com</a>. Originally a website dedicated to information about dynamic publishing as envisioned by <a href="”http://www.quark.com/”">Quark Software</a>, today TheDynamicPublisher.com has been both rebooted as a vendor-neutral, one-stop shop for information about dynamic publishing and related topics. I’m your host, <a href="”http://www.linkedin.com/in/scottabel”">Scott Abel</a>, <a href="”http://www.thecontentwrangler.com”">The Content Wrangler</a>. My job is to “wrangle” relevant, informative, and useful content about dynamic publishing from the world’s best and brightest experts and present it to you.</p>
<div id="attachment_1267" class="wp-caption alignleft" style="width: 215px"><a href="http://www.thedynamicpublisher.com/wp-content/uploads/2011/12/mobiledevicessmall-e1325281744866.jpg" rel="shadowbox[sbpost-1248];player=img;"><img class="size-full wp-image-1267" title="100(48).jpg" src="http://www.thedynamicpublisher.com/wp-content/uploads/2011/12/mobiledevicessmall-e1325281895769.jpg" alt="" width="205" height="151" /></a><p class="wp-caption-text">Dynamic publishing makes possible the efficient delivery of content to mobile devices, on demand</p></div>
<p>That’s not all. I’m also responsible for developing this site into a vibrant community of content professionals interested in promoting the methods, standards, and software tools required to create, assemble, and deliver relevant content to those who need it, dynamically, on demand.</p>
<p>In order to reach my goal, I’ll be seeking assistance from the crowd to help me build a body of knowledge and a community that will be a resource to those who want to move away from inefficient, old school ways of creating, managing and delivering content. Won’t you join me?</p>
<p></br><br />
<strong>How can you get involved?</strong></p>
<p>If you’ve got questions about dynamic content and related topics, <a href="”mailto:scottabel@mac.com”">send them to me via email</a> and I’ll find an expert or two to answer them for you.</p>
<p>If you’re a subject matter expert and would like to contribute an article, <a href="”mailto:scottabel@mac.com”">let me know</a> what you’d like to write about and how it relates to dynamic publishing.</p>
<p>If you work for a company that produces software products or related services designed to help organizations produce dynamic content and you’d like to become a sponsor, <a href="”mailto:scottabel@mac.com”">send me am email</a> and I’ll let you know how you can get involved.</p>
<p>And, if you’re interested in the topics presented on these digital pages, then read, comment and question. Become part of the community!</p>
<p>TheDynamicPublisher.com is a work in progress. We&#8217;ll be updating and improving the site as we go. If you have ideas about how we can make the site better, please <a href="mailto:scottabel@mac.com">let me know</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.thedynamicpublisher.com/2011/12/30/introducing-thedynamicpublisher-com-version-2-0/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Digital Omnivores Create Demand for Cross-platform Strategies</title>
		<link>http://www.thedynamicpublisher.com/2011/10/28/digital-omnivores-create-demand-for-cross-platform-strategies/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=digital-omnivores-create-demand-for-cross-platform-strategies</link>
		<comments>http://www.thedynamicpublisher.com/2011/10/28/digital-omnivores-create-demand-for-cross-platform-strategies/#comments</comments>
		<pubDate>Fri, 28 Oct 2011 15:12:14 +0000</pubDate>
		<dc:creator>Sophia Farina</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[Cross-Media Publishing]]></category>
		<category><![CDATA[Digital Publishing]]></category>
		<category><![CDATA[iPad]]></category>
		<category><![CDATA[White Papers]]></category>

		<guid isPermaLink="false">http://www.quarkvista.com.php5-25.dfw1-2.websitetestlink.com/?p=349</guid>
		<description><![CDATA[If there’s one thing we can count on, it’s that the desire for digital content consumption is not a fad; it’s here to stay. comScore recently studied the effect of the increased demand for digital content in a white paper they released titled Digital Omnivores: How Tablets, Smartphones and Connected Devices are Changing U.S. Digital Media [...]]]></description>
			<content:encoded><![CDATA[<p>If there’s one thing we can count on, it’s that the desire for digital content consumption is not a fad; it’s here to stay. comScore recently studied the effect of the increased demand for digital content in a white paper they released titled Digital Omnivores: <a href="http://www.comscore.com/Press_Events/Presentations_Whitepapers/2011/Digital_Omnivores" target="_blank"><em>How Tablets, Smartphones and Connected Devices are Changing U.S. Digital Media Consumption Habits</em></a>. Mark Donovan, comScore Senior Vice President of Mobile, summarized the results in their press release:</p>
<p><img class="alignleft" src="/wp-content/uploads/2011/11/digital_omnivores_1.jpg" alt="digital omnivore" width="250" height="250" />“The popularization of smartphones and the introduction of tablets and other Web-enabled devices — collectively termed ‘connected devices’ — have contributed to an explosion in digital media consumption. As these devices gain adoption, we have also seen the rise of the ‘digital omnivores’ — consumers who access content through several touchpoints during the course of their daily digital lives. In order to meet the needs of these consumers, advertisers and publishers must learn to navigate this new landscape so they develop cross-platform strategies to effectively engage their audiences.<sup>1</sup>”</p>
<p>Cross-media and dynamic publishing are Quark&#8217;s focus. But Quark also sees a significant need for our customers to create an engaging customer experience — whether it’s in print, via the Web, or on a digital device such as the iPad, an Android tablet, or other tablets.</p>
<p>In my last blog, I spoke about cross-media design; now I’d like to focus on workflow collaboration. As rate of content consumption changes, so do the requirements for publishing systems to support new types of content — namely digital — and for bringing new content specialists into content creation workflows.</p>
<h3>Digital Assets and Workflows</h3>
<p>Once you go digital, you’ll need to embrace the need for rich media content, whether it’s for your Web site or your iPad app; therefore, the first step in preparing to adopt a cross-media publishing strategy is to include support for creating and managing your digital assets and publications. This includes videos, slideshows, podcasts, any other interactive or multimedia assets, and iPad apps and issues.</p>
<p>If your team isn’t big enough or if you don’t have enough resources to create these new type of assets, no need to fret; there are services available (such as iStockphoto®) who saw the need to expand their portfolio beyond photos and now offer video, audio, and Flash content.</p>
<p>On the other hand, if you do have in-house multi-media staff, you want to make sure you include this staff in your workflows. It will be important that your digital issue designer has the ability to collaborate with, let’s say, your video editor to ensure that the video you include in your iPad issue delivers the best possible user experience (i.e. full-screen display, etc.). When working together, these two creative minds will surely deliver an immersive, brand-worthy experience.</p>
<p>On another level, if you’re trying to manage the creation of assets and publish content across documents, media, and platforms, you’ll also want to be able to set up specific workflows to support each type of digital asset and publication required. These workflows are vastly different from what may be required for print, in that the dependencies and timeline or workflow automation triggers may change.</p>
<h3>iPad Continues to Dominate</h3>
<p>“iPads dominate among tablets in driving digital traffic. In August 2011, iPads delivered 97.2 percent of all tablet traffic in the U.S. iPads have also begun to account for a higher share of Internet traffic than iPhones (46.8 percent vs. 42.6 percent of all iOS device traffic).<sup>1</sup>” As the iPad continues to dominate, the need to publish to the iPad is gaining ground. Thus, the ability to create an iPad app and publish an issue to that app is something that should be available in the cross-media publishing system.</p>
<p>With that aim in mind, as you evaluate the best cross-platform strategy for you, it’s imperative that you understand what’s involved and leverage best practices shared by the industry experts. The next step for starting your exploration, if you haven’t already, is to take the time to check out <a href="http://publish.quark.com/content/eSeminarWorldofDynamicPublishing" target="_blank">The World of Dynamic Publishing</a>, a five-part webinar series brought to you by Quark Software Inc with leading industry experts.</p>
<hr />
<p><sup>1</sup>comScore, (2011, October 10) Smartphones and Tablets Drive Nearly 7 Percent of Total U.S. Digital Traffic [Press Release]. Retrieved from <a href="http://www.comscore.com/Press_Events/Press_Releases/2011/10/Smartphones_and_Tablets_Drive_Nearly_7_Percent_of_Total_U.S._Digital_Traffic" target="_blank">http://www.comscore.com/Press_Events/Press_Releases/2011/10/Smartphones_and_Tablets_Drive_Nearly_7_<br />
Percent_of_Total_U.S._Digital_Traffic</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.thedynamicpublisher.com/2011/10/28/digital-omnivores-create-demand-for-cross-platform-strategies/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

