Hari's Corner
Humour, comics, tech, law, software, reviews, essays, articles and HOWTOs intermingled with random philosophy now and thenPython's unicode strings and QString gotcha
Filed under:
Bits and Bytes by
Hari
Posted on Fri, Mar 13, 2009 at 17:47 IST (last updated: Wed, Mar 18, 2009 @ 20:28 IST)
Most confusing, as the output file ended with a series of question marks instead of the actual Unicode characters. Surely I was doing everything right? After investigating the Python side fully, I turned to QT's QString class for inspiration. Turns out that you need to actually convert the QString first to a UTF-8 bytestream using QString'sdef exportFile (self, filename): """Procedure to export the UNICODE contents to a file""" txtOutput = self.findChild (QtGui.QPlainTextEdit, "txtTamil") fcontents = unicode (txtOutput.toPlainText(), "utf-8") f = codecs.open (filename, "w", encoding="utf-8") f.write ( fcontents ) f.close ()
toUtf8 ()
function before calling the Python unicode ()
function.
Code which works as expected:
def exportFile (self, filename): """Procedure to export the UNICODE contents to a file""" txtOutput = self.findChild (QtGui.QPlainTextEdit, "txtTamil") fcontents = unicode (txtOutput.toPlainText().toUtf8(), "utf-8") f = codecs.open (filename, "w", encoding="utf-8") f.write ( fcontents ) f.close ()
Comments closed
The blog owner has closed further commenting on this entry.
5 comment(s)
Does "f.write(txtOutput.toPlainText())" work?
Comment by tim (visitor) on Sat, Mar 14, 2009 @ 21:19 IST #
Couldn't find the reason why though I searched the web. It seems very strange, but the only thing I could fathom is that the automatic conversion of QString to python string is not Unicode aware.
I tried a lot of stuff with this, but it seems that QString's internal handling of the actual Unicode data is not friendly to Python
Comment by Hari (blog owner) on Sat, Mar 14, 2009 @ 21:28 IST #
Both the Unicode Python string and QString is either 2 or 4 bytes depending on compilation options, but not necessarily match (according to the docs, anyway). So, if the number of question marks is double that of the number of normal codepoints expected, than python is using a 2 byte representation and QString is using a 4 byte representation. Vice-versa if the number of question marks is half. It is possible that they are using two different encodings with the same size, but not likely.
Hmm, does writing out the output from toUtf8() work? Of course, you have to turn off Python's utf8 conversion and write it out as plain old bytes.
Comment by tim (visitor) on Sun, Mar 15, 2009 @ 04:29 IST #
Traceback (most recent call last):
File "/home/hari/Projects/PyTamEditor/pytameditor_main.py", line 153, in onFileExport
self.exportFile (filename)
File "/home/hari/Projects/PyTamEditor/pytameditor_main.py", line 111, in exportFile
f.write (txtOutput.toPlainText().toUtf8())
File "/usr/lib/python2.5/codecs.py", line 638, in write
return self.writer.write(data)
File "/usr/lib/python2.5/codecs.py", line 303, in write
data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 0: ordinal not in range(128 )
Comment by Hari (blog owner) on Sun, Mar 15, 2009 @ 08:03 IST #
But is that a recommended way of writing to Unicode files? I thought using codecs.open() was the recommended way?
Comment by Hari (blog owner) on Sun, Mar 15, 2009 @ 08:08 IST #