<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>mitcho.com &#187; HTML</title>
	<atom:link href="http://mitcho.com/blog/tag/html/feed/" rel="self" type="application/rss+xml" />
	<link>http://mitcho.com</link>
	<description></description>
	<lastBuildDate>Tue, 07 Feb 2012 02:04:41 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.4-alpha-19719</generator>
		<item>
		<title>Disgusting Word-formatted HTML and how to fix it</title>
		<link>http://mitcho.com/blog/projects/disgusting-word-formatted-html-and-how-to-fix-it/</link>
		<comments>http://mitcho.com/blog/projects/disgusting-word-formatted-html-and-how-to-fix-it/#comments</comments>
		<pubDate>Wed, 30 Dec 2009 21:29:44 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[observation]]></category>
		<category><![CDATA[projects]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[HTML]]></category>
		<category><![CDATA[markup]]></category>
		<category><![CDATA[microsoft]]></category>
		<category><![CDATA[MITWPL]]></category>
		<category><![CDATA[Office]]></category>
		<category><![CDATA[perl]]></category>
		<category><![CDATA[word]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=3287</guid>
		<description><![CDATA[In working on a new website for the MIT Working Papers in Linguistics, I recently inherited a collection of HTML files with all of our books&#8217; abstracts. To my dismay (but not surprise) the markup in these files were horrendous. Here are some of the cardinal sins of markup that I saw committed in these [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/markdown-for-wordpress-and-bbpress/' rel='bookmark' title='Markdown for WordPress and bbPress'>Markdown for WordPress and bbPress</a></li>
<li><a href='http://mitcho.com/blog/observation/%e5%8f%8e%e9%9b%86-vs-%e5%9b%9e%e5%8f%8e-and-better-word-meanings-through-usage/' rel='bookmark' title='回収 vs. 収集 and Better Word Meanings Through Usage'>回収 vs. 収集 and Better Word Meanings Through Usage</a></li>
<li><a href='http://mitcho.com/blog/life/the-most-beautiful-word/' rel='bookmark' title='The Most Beautiful Word'>The Most Beautiful Word</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p>In working on a new website for the MIT Working Papers in Linguistics, I recently inherited a collection of HTML files with all of our books&#8217; abstracts. To my dismay (but not surprise) the markup in these files were horrendous. Here are some of the cardinal sins of markup that I saw committed in these files:</p>

<ol>
<li><strong>Confusing <code>id</code>s and <code>class</code>es.</strong> <code>id</code>s should be unique on the page&#8230; but here&#8217;s an instance of using multiple instances of the same <code>id</code> in order to format them together.<br/></li>
</ol>


<div class="wp_syntax"><div class="code"><pre class="html" style="font-family:monospace;">&lt;div id=&quot;indent&quot;&gt; &lt;div id=&quot;number&quot;&gt;4.2.1&lt;/div&gt; &lt;div id=&quot;page&quot;&gt;161&lt;/div&gt; &lt;div id=&quot;section&quot;&gt;Old French (Adams 1987)&lt;/div&gt;
&lt;/div&gt; &lt;div id=&quot;indent&quot;&gt; &lt;div id=&quot;number&quot;&gt;4.2.2&lt;/div&gt; &lt;div id=&quot;page&quot;&gt;164&lt;/div&gt; &lt;div id=&quot;section&quot;&gt;The evolution of the dialects of northern Italy&lt;/div&gt;</pre></div></div>


<ol>
<li><strong>Putting a class on every instance of something.</strong> Everything paragraph should be formatted equivalently. We get the point.<br/></li>
</ol>


<div class="wp_syntax"><div class="code"><pre class="html" style="font-family:monospace;">&lt;p class=MsoNormal&gt;&lt;b&gt;The English Noun Phrase in Its Sentential Aspect&lt;/b&gt;&lt;/p&gt;
&lt;p class=MsoNormal&gt;Steven Paul Abney&lt;/p&gt;
&lt;p class=MsoNormal&gt;May 1987&lt;/p&gt;</pre></div></div>


<ol>
<li><strong>Using blank space for formatting.</strong>  <br/></li>
</ol>


<div class="wp_syntax"><div class="code"><pre class="html" style="font-family:monospace;">&lt;p class=MsoNormal&gt;&lt;o:p&gt;&amp;amp;nbsp;&lt;/o:p&gt;&lt;/p&gt;</pre></div></div>


<ol>
<li><strong>CSS styles that don&#8217;t exist.</strong> Browsers just ignore these anyway&#8230; <br/></li>
</ol>


<div class="wp_syntax"><div class="code"><pre class="html" style="font-family:monospace;">&lt;p class=MsoNormal&gt;One factor in determining which worlds a modal quantifies
over is the temporal argument of the modal’s accessibility relation.&lt;span
style='mso-spacerun:yes'&gt;  &lt;/span&gt;It is well-known that a higher tense affects
the accessibility relation of modals.&lt;span style='mso-spacerun:yes'&gt; 
&lt;/span&gt;What is not well-known is that there are aspectual operators high enough
to affect the accessibility relation of modals.&lt;span style='mso-spacerun:yes'&gt; 
&lt;/span&amp;gt</pre></div></div>


<h3>The solution</h3>

<p>My solution was to write a perl script which takes care of a number of these issues. It&#8217;s not foolproof and doesn&#8217;t involve any voodoo—for example, it can&#8217;t retypeset things which were formatted using whitespace—but it does a good job as a first pass.</p>

<div class="files">
<div class="file">
<a href="http://mitcho.com/blog/wp-content/uploads/2009/12/cleanwordhtml.pl_.txt">cleanwordhtml.pl</a><br/>
<span class="specs">perl</span>
</div>
</div>

<p>You can run the script by making it executable (<code>chmod +x cleanwordhtml.pl</code>) then specifying a target filename as an argument. For example,</p>


<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">.<span style="color: #000000; font-weight: bold;">/</span>cleanwordhtml.pl source.html <span style="color: #000000; font-weight: bold;">&gt;</span> clean.html</pre></div></div>


<p>I used this with a simple bash for loop to run over all my files:</p>


<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">for</span> f <span style="color: #000000; font-weight: bold;">in</span> <span style="color: #000000; font-weight: bold;">*/*</span>.html; <span style="color: #000000; font-weight: bold;">do</span> .<span style="color: #000000; font-weight: bold;">/</span>cleanwordhtml.pl <span style="color: #007800;">$f</span> <span style="color: #000000; font-weight: bold;">&gt;</span> <span style="color: #800000;">${f%.html}</span>-clean.html; <span style="color: #000000; font-weight: bold;">done</span>;</pre></div></div>


<p>Hopefully someone else can benefit from my experience.</p>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/markdown-for-wordpress-and-bbpress/' rel='bookmark' title='Markdown for WordPress and bbPress'>Markdown for WordPress and bbPress</a></li>
<li><a href='http://mitcho.com/blog/observation/%e5%8f%8e%e9%9b%86-vs-%e5%9b%9e%e5%8f%8e-and-better-word-meanings-through-usage/' rel='bookmark' title='回収 vs. 収集 and Better Word Meanings Through Usage'>回収 vs. 収集 and Better Word Meanings Through Usage</a></li>
<li><a href='http://mitcho.com/blog/life/the-most-beautiful-word/' rel='bookmark' title='The Most Beautiful Word'>The Most Beautiful Word</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/disgusting-word-formatted-html-and-how-to-fix-it/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Using Templates with YARPP 3</title>
		<link>http://mitcho.com/blog/projects/yarpp-3-templates/</link>
		<comments>http://mitcho.com/blog/projects/yarpp-3-templates/#comments</comments>
		<pubDate>Wed, 14 Jan 2009 13:26:38 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[beta]]></category>
		<category><![CDATA[HTML]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[template]]></category>
		<category><![CDATA[WordPress]]></category>
		<category><![CDATA[WordPress Planet]]></category>
		<category><![CDATA[YARPP]]></category>
		<category><![CDATA[Yet Another Photoblog]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=1270</guid>
		<description><![CDATA[Post updated January 2012 to reflect cleaner template code available with YARPP 3.4. If you have a YARPP support question not directly related to the templating feature, please use the YARPP support forums. Version 3 of Yet Another Related Posts Plugin is a major rewrite which adds two new powerful features: caching and templating. Today [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/yet-another-related-posts-plugin-20/' rel='bookmark' title='Yet Another Related Posts Plugin 2.0'>Yet Another Related Posts Plugin 2.0</a></li>
<li><a href='http://mitcho.com/blog/projects/yet-another-related-posts-plugin/' rel='bookmark' title='Yet Another Related Posts Plugin'>Yet Another Related Posts Plugin</a></li>
<li><a href='http://mitcho.com/blog/projects/keep-up-with-yet-another-related-posts-plugin-with-rss/' rel='bookmark' title='Keep up with Yet Another Related Posts Plugin with RSS!'>Keep up with Yet Another Related Posts Plugin with RSS!</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p><em>Post updated January 2012 to reflect cleaner template code available with YARPP 3.4.</em></p>

<p><strong>If you have a YARPP support question not directly related to the templating feature, please use <a href="http://wordpress.org/tags/yet-another-related-posts-plugin">the YARPP support forums</a>.</strong></p>

<p>Version 3 of <a href="http://mitcho.com/code/yarpp">Yet Another Related Posts Plugin</a> is a major rewrite which adds two new powerful features: caching and templating. Today I&#8217;m going to show you how you can use <em>templates</em> to customize the look of your related posts output.<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup></p>

<p>Previously with YARPP you were relatively limited in the ways you could present related posts. You were able to set some HTML tags to wrap your posts in and choose how much of an excerpt (if any) to display. This limited interface worked great for many users&#8212;indeed, these options still exists in YARPP 3.0. However, there&#8217;s also a new option for those of you who want to put your PHP skills to work and have complete control over your related posts display. The option will let you choose any files in the <code>templates</code> subdirectory of YARPP.</p>

<p><img src="http://mitcho.com/blog/wp-content/uploads/2009/01/e38394e382afe38381e383a3-1.png" alt="templates interface" title="templates interface" width="410" height="163" class="alignnone size-full wp-image-1273" /></p>

<p><span id="more-1270"></span></p>

<h3>The structure of a YARPP template</h3>

<p>Let&#8217;s take a look inside the example template, included with YARPP 3 (<code>yarpp-template-example.php</code>):</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>5
6
7
8
9
10
11
12
13
14
</pre></td><td class="code"><pre class="php" style="font-family:monospace;">&lt;h3&gt;Related Posts&lt;/h3&gt;
<span style="color: #000000; font-weight: bold;">&lt;?php</span> <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>have_posts<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">:</span><span style="color: #000000; font-weight: bold;">?&gt;</span>
&lt;ol&gt;
	<span style="color: #000000; font-weight: bold;">&lt;?php</span> <span style="color: #b1b100;">while</span> <span style="color: #009900;">&#40;</span>have_posts<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">:</span> the_post<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #000000; font-weight: bold;">?&gt;</span>
	&lt;li&gt;&lt;a href=&quot;<span style="color: #000000; font-weight: bold;">&lt;?php</span> the_permalink<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #000000; font-weight: bold;">?&gt;</span>&quot; rel=&quot;bookmark&quot;&gt;<span style="color: #000000; font-weight: bold;">&lt;?php</span> the_title<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #000000; font-weight: bold;">?&gt;</span>&lt;/a&gt;&lt;/li&gt;
	<span style="color: #000000; font-weight: bold;">&lt;?php</span> <span style="color: #b1b100;">endwhile</span><span style="color: #339933;">;</span> <span style="color: #000000; font-weight: bold;">?&gt;</span>
&lt;/ol&gt;
<span style="color: #000000; font-weight: bold;">&lt;?php</span> <span style="color: #b1b100;">else</span><span style="color: #339933;">:</span> <span style="color: #000000; font-weight: bold;">?&gt;</span>
&lt;p&gt;No related posts.&lt;/p&gt;
<span style="color: #000000; font-weight: bold;">&lt;?php</span> <span style="color: #b1b100;">endif</span><span style="color: #339933;">;</span> <span style="color: #000000; font-weight: bold;">?&gt;</span></pre></td></tr></table></div>


<p>There are two basic parts to this (and most all) YARPP template: (a) what you display when there are related posts and (b) what you display when there aren&#8217;t. We make this switch with the conditional on line 6. If there are related posts, we introduce an ordered list and use the <code>while</code> loop to loop over all the related posts. For each post, we use the snippet <code>the_post();</code> to load the appropriate post data, then print the line item.</p>

<p>You&#8217;ll notice that we&#8217;re using familiar template tags here such as <code>the_permalink()</code> and <code>the_title()</code>. If you&#8217;ve ever had to tweak or build a WordPress theme before, you&#8217;ll immediately feel at home. I&#8217;ll touch on this again later.</p>

<h3>The power of PHP</h3>

<p>One big advantage of this new templating system is that you can control exactly how the posts are listed, breaking out of all of the previous structural limitations. For example, in the <code>template-list.php</code> template, we put the information for each related post in an array and then concatenate the strings with <code>implode</code>. This way, we produce a comma-separated list for our readers without any stray commas before or after the list, which was impossible until now.</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>11
12
13
14
15
16
</pre></td><td class="code"><pre class="php" style="font-family:monospace;">	<span style="color: #000088;">$postsArray</span> <span style="color: #339933;">=</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #b1b100;">while</span> <span style="color: #009900;">&#40;</span>have_posts<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">:</span> the_post<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
		<span style="color: #000088;">$postsArray</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #0000ff;">'&lt;li&gt;&lt;a href=&quot;'</span><span style="color: #339933;">.</span>get_the_permalink<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">.</span><span style="color: #0000ff;">'&quot; rel=&quot;bookmark&quot;&gt;'</span><span style="color: #339933;">.</span>get_the_title<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">.</span><span style="color: #0000ff;">'&lt;/a&gt;&lt;/li&gt;'</span><span style="color: #339933;">;</span>
	<span style="color: #b1b100;">endwhile</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #b1b100;">echo</span> <span style="color: #990000;">implode</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">', '</span><span style="color: #339933;">,</span><span style="color: #000088;">$postsArray</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #666666; font-style: italic;">// print out a list of the related items, separated by commas</span></pre></td></tr></table></div>


<p>You can also run any arbitrary PHP in the template file—even roll another WP_Query, as in the case of <code>template-random.php</code>, where a random post is returned when there are no related posts.</p>

<h3>Familiar template tags</h3>

<p>As mentioned before, the tags we use in these YARPP templates are the same as the template tags used in any WordPress template. In fact, any WordPress <a href="http://codex.wordpress.org/Template_Tags">template tag</a> will work in the YARPP <a href="http://codex.wordpress.org/The_Loop">Loop</a>. You can use these template tags to display the excerpt, the post date, the comment count, or even some custom metadata. I&#8217;ve also written two special template tags which only work within a YARPP Loop: <code>the_score()</code> and <code>get_the_score()</code>. As you may expect, this will print or return the match score of that particular related post.</p>

<p>In addition, template tags from other plugins will also work. For an example, take a look at the <code>yarpp-template-photoblog.php</code> file:</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>8
9
10
11
12
</pre></td><td class="code"><pre class="php" style="font-family:monospace;">	<span style="color: #000000; font-weight: bold;">&lt;?php</span> <span style="color: #b1b100;">while</span> <span style="color: #009900;">&#40;</span>have_posts<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">:</span> the_post<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #000000; font-weight: bold;">?&gt;</span>
		<span style="color: #000000; font-weight: bold;">&lt;?php</span> <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span><span style="color: #990000;">function_exists</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'yapb_is_photoblog_post'</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">:</span> <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>yapb_is_photoblog_post<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">:</span><span style="color: #000000; font-weight: bold;">?&gt;</span>
		&lt;li&gt;&lt;a href=&quot;<span style="color: #000000; font-weight: bold;">&lt;?php</span> the_permalink<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #000000; font-weight: bold;">?&gt;</span>&quot; rel=&quot;bookmark&quot;&gt;<span style="color: #000000; font-weight: bold;">&lt;?php</span> yapb_get_thumbnail<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #000000; font-weight: bold;">?&gt;</span>&lt;/a&gt;&lt;/li&gt;
		<span style="color: #000000; font-weight: bold;">&lt;?php</span> <span style="color: #b1b100;">endif</span><span style="color: #339933;">;</span> <span style="color: #b1b100;">endif</span><span style="color: #339933;">;</span> <span style="color: #000000; font-weight: bold;">?&gt;</span>
	<span style="color: #000000; font-weight: bold;">&lt;?php</span> <span style="color: #b1b100;">endwhile</span><span style="color: #339933;">;</span> <span style="color: #000000; font-weight: bold;">?&gt;</span></pre></td></tr></table></div>


<p>In this template&#8217;s YARPP Loop, we use some template tags introduced by the <a href="http://wordpress.org/extend/plugins/yet-another-photoblog/">Yet Another Photoblog</a> plugin. If you have the Yet Another Photoblog plugin installed, you can use this template to display thumbnails of related posts in lieu of the titles. Notice that here we&#8217;re checking first whether each related post is indeed a photo post or not using <code>yapb_is_photoblog_post()</code> and then using the Yet Another Photoblog <code>yapb_get_thumbnail()</code> template tag to get the location of the thumbnail.</p>

<p>Templating in YARPP 3.0 enables the blog admin to uber-customize their related posts display using the lingua franca of PHP and <a href="http://codex.wordpress.org/Template_Tags">template tags</a>. Feel free to comment here with ideas, comments, and of course links to your YARPP-powered blogs. I look forward to seeing what the WordPress community does with this new feature!</p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:1">
<p>For those of you interested in the WP and SQL voodoo used to make this happen, I&#8217;ve posted <a href="http://mitcho.com/blog/how-to/external-orders-in-wordpress-queries/">a more technical article</a>.&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/yet-another-related-posts-plugin-20/' rel='bookmark' title='Yet Another Related Posts Plugin 2.0'>Yet Another Related Posts Plugin 2.0</a></li>
<li><a href='http://mitcho.com/blog/projects/yet-another-related-posts-plugin/' rel='bookmark' title='Yet Another Related Posts Plugin'>Yet Another Related Posts Plugin</a></li>
<li><a href='http://mitcho.com/blog/projects/keep-up-with-yet-another-related-posts-plugin-with-rss/' rel='bookmark' title='Keep up with Yet Another Related Posts Plugin with RSS!'>Keep up with Yet Another Related Posts Plugin with RSS!</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/yarpp-3-templates/feed/</wfw:commentRss>
		<slash:comments>113</slash:comments>
		</item>
	</channel>
</rss>

