Hari's Corner

Humour, comics, tech, law, software, reviews, essays, articles and HOWTOs intermingled with random philosophy now and then

HTML inside XML - rendering with XSLT issue

Filed under: Tutorials and HOWTOs by Hari
Posted on Fri, Jul 6, 2007 at 20:09 IST (last updated: Wed, Jul 16, 2008 @ 20:39 IST)

I've been meddling a bit with XML and XSLT, mostly out of curiosity, but also because I want to try and convert my reviews site completely to entirely to XML. I seem to have a strange fascination for XML because the possibilities seem limitless. Anyway, I digress. This topic is really to document a little discovery I made.

When you embed HTML inside XML document elements like this (an example). Note the HTML tags used inside <content> which is used as formatting tags and not part of the XML.
<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="index.xsl" ?>

<review> <title>A sample review</title> <author>A sample author</author> <content> <p>This is the actual review content</p> <p>Hello World! How are you today?</p> </content> </review>

Here's the XSLT file index.xsl I wrote to render the above XML file
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
  <html>
  <body>
    <h1>Hari's Reviews</h1>
    <hr />
    <h2><xsl:value-of select="review/title"/> by
<xsl:value-of select="review/author"/></h2>

<xsl:value-of select="review/content"/> </body> </html> </xsl:template> </xsl:stylesheet>

The above code will not render correctly because the <xsl:value-of select="review/content"/> tag will strip the HTML tags (as they're considered to be actually XML tags) while returning the data.

The correct code will be: <xsl:copy-of select="review/content"/>

Update: The smart quotes problem in the code has been fixed.
Comments (0)  

Opera 9.21 review

Filed under: Software and Technology by Hari
Posted on Wed, Jul 4, 2007 at 13:10 IST (last updated: Thu, May 7, 2009 @ 21:19 IST)

Opera I downloaded and installed the latest version of the Opera web browser yesterday and I've been pleasantly surprised by the new features since the last time I tried it out. Here is a list of what I really appreciate about Opera: Not convinced? Check out the full list of features.

On the downside, I did notice a few quirks in some JavaScript code execution as well as in loading certain websites, but overall it's an excellent, matured browser which shows in the polish of its interface and features.

Update: There is a problem with complex Unicode font rendering. So I cannot read my Tamil posts properly in Opera as the complex characters aren't shaped properly. The bug has been reported to Opera.

Opera is now my primary web browser. Yes, I'm a convert. :cool:
Comments (19)  

Website generation using asciidoc

Filed under: Tutorials and HOWTOs by Hari
Posted on Tue, Jul 3, 2007 at 10:41 IST (last updated: Wed, Oct 1, 2008 @ 22:10 IST)

asciidoc is a pretty neat document generation and formatting system which relies on plain text files with minimum formatting codes to generate documentation in a variety of different formats. It's very easy to use and very flexible. In fact, with a bit of scripting, you can automate building an entire website using asciidoc sources. I'm aware that this method is a bit dated what with advanced database-driven CMSes to build dynamic websites these days, but it's still a neat way to create a simple, no-frills, fully portable website.

Here's a rough and ready script I wrote in Python to build a website from a directory tree. Feel free to customize it and use it as you wish.

Usage notes:
  1. You need to have asciidoc and python installed on your system.
  2. Save the script below as website-gen.py and make it executable (using chmod +x website-gen.py from the command line).
  3. Customize the source and destination directory in the script to your needs.
  4. Create the asciidoc sources in the source directory tree.
  5. Create a layout.conf file (optional) or remove the -f layoutfile option from the script to use default settings.
  6. Run the script to generate the output in XHTML 1.1 format in the destination directory tree.
  7. For more information on asciidoc, you can read the man page and the full online documentation.
The Python script:
#!/usr/bin/env python          

import os

# The source directory containing the asciidoc .txt files # (you can modify this to your needs) source_dir = './source/'

# The destination directory # (you can modify this to your needs) dest_dir = './html'

# The layout configuration file # (you can use your own layout file. Modify the layout file # to your needs or remove the layout_file = './layout.conf'

# Run through the source directory and compile the asciidoc files for root, dirs, files in os.walk (source_dir): for file in files: if (file.endswith ('.txt')): fullfilepath = os.path.join (root, file) # Use the commented command to generate with no # custom layout file # command = 'asciidoc ' + fullfilepath command = 'asciidoc -f ' + layout_file + ' ' + fullfilepath os.system (command) print fullfilepath

print 'Completed generating documentation from sources'

# Remove the destination directory if it exists command = 'rm -rf ' + dest_dir os.system (command)

print 'Copying to destination dir...'

# Copy the source dir to the destination command = 'cp -dpR ' + source_dir + ' ' + dest_dir os.system (command)

# Remove the source files from the destination directory for root, dirs, files in os.walk (dest_dir): for file in files: if (file.endswith ('.txt')): os.remove (os.path.join (root, file))

print 'Completed successfully'
Comments (0)  

My blogging principles

Filed under: Internet and Blogging by Hari
Posted on Fri, Jun 29, 2007 at 11:35 IST (last updated: Wed, Oct 29, 2008 @ 21:02 IST)

For what they're worth, here are some of the blogging principles I try to stick to as much as possible.

Avoiding controversy

I'm aware that the world outside is a mixture of positive and negative, good and bad, right and wrong. I'm not going too deep into the philosophical or religious aspects of this. All I can say is that there are thousands of websites out there which allow discussion of controversial topics and I try not to be one of them.

I've had my fair share of rants, but I've consciously tried to avoid voicing my thoughts on negative or controversial topics. I am especially wary about being critical of other people and issues of world-wide magnitude. Flame wars are a strict no-no on this blog.

Keeping it varied

I try not to dwell too much on the same topic again and again. I know this is easier said than done, but when I find that I have nothing much to say on a particular issue, I don't raise that issue at all. I've abandoned a lot of drafts in this manner after realizing that I've either said all I wanted to say before or really don't care much about the issue I've started writing about.

This is also a reason why I started doing cartoons and other features. It's a nice change of pace and adds colour to an otherwise dull blogging routine.

Keeping it light

Original humour is very creatively very demanding, but I have consciously followed the policy of making a few light, humourous posts every now and then to maintain a positive feel to this blog. Too many serious articles tend to weigh heavily on readers over a period of time.

For the same reason, I avoid being too personal in my writings.

Keeping it (reasonably) informative

I've been a bit lazy about sharing information in the past, but I have realized the importance of documenting certain information even if only to remind myself of something in the future. For instance, I tend to get something to work in Linux after a lot of fiddling around and then months later, I keep wondering how I got it to work and repeat the whole procedure all over again.

Sometimes it's a non-technical issue, but mostly it's technical in nature, so I decided that nothing is too insignificant for documentation - in fact, the more obscure the information, the better.

Keeping it content-rich

I have a morbid dread of one-liners. I've seen a lot of bloggers post one-liners regularly and it seems to work for them but somehow I've never got myself to do that. In the past my posts have tended to run too long, but these days I try to maintain a balance. Short posts are good for variety but even in my longer articles, I try to avoid large blocks of paragraphs without breaks. More recently, I've started using structuring (like headings, bulleted and numbered lists and so on) to vary the monotony of large chunks of text.

I'm still not sure how many people appreciate reading my longer posts though :P

I know that most of my regular readers will be aware of these aspects, but I thought I would make a post about it anyway. If any of you have blogging principles like these, I would be interested to know what they are.
Comments (10)  

Amaya - W3C's official web browser and editor

Filed under: Software and Technology by Hari
Posted on Fri, Jun 22, 2007 at 13:37 IST (last updated: Thu, May 7, 2009 @ 21:19 IST)

Amaya, W3C's web official cross-platform browser and XHTML editor recently made it to the Debian repositories. I was curious about this little application because it is actually a pretty neat WYSIWYG editor that produces clean, fully standards-compliant code. And it's only one of the few editors I've come across that allows you to create XHTML 1.0 (Transitional and Strict) as well as XHTML 1.1. Being W3C's own browser, one would expect it to be fairly up-to-date in standards compliance.

Amaya Screenshot

Like all WYSIWYG editors though, it has its drawbacks. CSS editing is a chore and you might as well use a text editor to do it and save plenty of time. Using inline style attributes for elements is possible with the built-in CSS dialog box, but I don't recommend that approach since it makes styles unwieldy, hard to re-use and difficult to manage. Unlike a lot of other HTML editors, Amaya makes it easy to use DIV elements for layout and it has an option to display the document structure clearly. However, navigating within nested elements in the document can be a bit of a pain because the cursor almost never seems to move where you want it to. It also seems to be prone to crashes so I recommend saving documents frequently.

I had a bit of fun playing around with Amaya. It certainly appears to be feature rich but all said and done, I'll probably stick to Quanta Plus for my serious web editing and development needs. But for those of you who aren't keen on messing with raw code and need a WYSIWYG editor that conforms to cutting edge web standards, Amaya is a good option.
Comments (4)  

Stupid, annoying VCD/DVD edits

Filed under: Software and Technology by Hari
Posted on Wed, Jun 20, 2007 at 17:58 IST (last updated: Wed, Jul 16, 2008 @ 21:13 IST)

If you own a sizable collection of VCD or DVD movies you would understand what I am talking about.

Why do VCD and DVD manufacturers randomly edit portions of the movies they sell? It's not censorship, because there's nothing in censor in the kind of movies I watch. Yet these distributors cut out 2 seconds here, 10 seconds there, half a minute here and two minutes there and before you know it the whole movie experience is ruined.

In one of the movies I bought recently, almost the entire climax scene was cut out! The whole movie fell flat. A 30-second cut can make all the difference between a great movie and a good one. This is exactly what happens when unprofessional idiots decide to take liberties with the video sequence.

Talk about stupidity. They let the whole (entirely useless) credits sequence run in full but decide that they didn't have enough space to fit in the most important part of the movie! One also misses subtle humour sequences because the DVD producers are too thick-headed to understand the importance or value of such scenes. Sometimes it's just a word or two cut out, but that makes all the difference, I say! For instance, in one sequence of a movie I really enjoy, the villain would beg the hero to let him go during a big fight sequence at the end, but that bit was chopped off in the VCD. The scene was flat because the tension really builds up at that time when the hero advances upon him to beat him up in spite of his appeal.

When I pay the full price for a legal DVD or a VCD, I expect the full movie to be available, without cuts. I'm not talking about cuts made by the official censors before the movie got released, but the cuts made by these idiot DVD manufacturers in their studio. If they absolutely have to make edits, at least the fact should be clearly mentioned in the cover. If it's not mentioned, as a consumer I feel cheated out of my money. Why should people have to put up with it? Do these distributors get away with it because most consumers cannot be bothered about lodging a complaint? When the whole movie industry is crying foul about piracy, why should customers stay silent when they're being subtly cheated in this manner?

What is surprising is that this practice of random editing seems to be fairly common in the industry. I've not seen too many unedited movie VCDs or DVDs so far. Almost every single one in my collection has at least one perceptible sequence cut. Most have two or more. I really cannot understand the reason for this practice. Disk space is definitely not a constraint with DVDs and most VCD movies are packaged in sets of two. That should be enough to fit the whole movie intact. What do they gain by wasting so much effort editing?
Comments (9)