This is more a small personal note than anything else, but writing a UTF-8 string to a UTF-8 file is a bit tricky if you're using PyQt and doing implicit conversion between QString and Python's built in unicode string.
What I tried to achieve: To get the contents of a text box and save the Unicode contents to a file.
Here's what didn't work actually:
def exportFile (self, filename):
"""Procedure to export the UNICODE contents to a file"""
txtOutput = self.findChild (QtGui.QPlainTextEdit, "txtTamil")
fcontents = unicode (txtOutput.toPlainText(), "utf-8")
f = codecs.open (filename, "w", encoding="utf-8")
f.write ( fcontents )
f.close ()
Most confusing, as the output file ended with a series of question marks instead of the actual Unicode characters. Surely I was doing everything right?
After investigating the Python side fully, I turned to QT's QString class for inspiration. Turns out that you need to actually convert the QString first to a UTF-8 bytestream using QString's
toUtf8 () function before calling the Python
unicode () function.
Code which works as expected:
def exportFile (self, filename):
"""Procedure to export the UNICODE contents to a file"""
txtOutput = self.findChild (QtGui.QPlainTextEdit, "txtTamil")
fcontents = unicode (txtOutput.toPlainText().toUtf8(), "utf-8")
f = codecs.open (filename, "w", encoding="utf-8")
f.write ( fcontents )
f.close ()
5 comment(s)
Leave a comment »Does "f.write(txtOutput.toPlainText())" work?
Comment by tim (visitor) on 14 Mar 2009 @ 21:19 IST #
Comment by Hari (blog owner) on 14 Mar 2009 @ 21:28 IST #
Comment by tim (visitor) on 15 Mar 2009 @ 04:29 IST #
File "/home/hari/Projects/PyTamEditor/pytameditor_main.py", line 153, in onFileExport
self.exportFile (filename)
File "/home/hari/Projects/PyTamEditor/pytameditor_main.py", line 111, in exportFile
f.write (txtOutput.toPlainText().toUtf8())
File "/usr/lib/python2.5/codecs.py", line 638, in write
return self.writer.write(data)
File "/usr/lib/python2.5/codecs.py", line 303, in write
data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 0: ordinal not in range(128 )
Comment by Hari (blog owner) on 15 Mar 2009 @ 08:03 IST #
Comment by Hari (blog owner) on 15 Mar 2009 @ 08:08 IST #