Welcome to the OvniConv project. ====== Presentation ====== This project is about developing some OpenDocumentFormat tools to help converting to TCVN 6909:2001 (Unicode) all files encoded with old Vietnamese encoding, say TCVN 5712:1993, VNI, VPS, and so on. ===== Second try: more proof of concept! ===== * Mar.2008 : * Using a more evolved Python script, now we can convert font names as well! * Project available at https://launchpad.net/ovniconv/ * Sources at http://bazaar.launchpad.net/~progfou/ovniconv/trunk/files * It's now version 1.0, ready to be used in command line mode! Debian/Ubuntu package to come anytime soon now! :-) * Apr.2008 : * Moved Python support for Vietnamese encodings into a separate project * Sources available at http://bazaar.launchpad.net/~progfou/python-vietnamese/trunk/files * Let's make it a proposal for default Python distribution! :-) * Debian/Ubuntu packages available at https://launchpad.net/~progfou/+archive * and also as Gutsy contrib in the [[others:HanoiLUG]] [[soft:apt]] repository ====== Experimentation ====== ===== First try: proof of concept! ===== * open an old TCVN encoded MS-Office .DOC file using OOo: ooffice test-tcvn.doc * save the file in .ODT format, then quit OOo * use unzip to extract the .ODT content.xml file unzip test.odt content.xml * recode content.xml from UTF-8 to WINDOWS-1252 iconv --from=UTF-8 --to=WINDOWS-1252 < content.xml > content-tcvn.xml * recode content.xml from TCVN-5712 to UTF-8 iconv --from=TCVN-5712 --to=UTF-8 < content-tcvn.xml > content.xml * use zip to put back content.xml in the .ODT file zip test.odt content.xml * open the .ODT file using OOo ooffice test.odt * It's all Unicode encoded! (but fonts are still declared as .vn* ones) * Note that there still is some issue with some special characters (like double-quote) which are loosely replaced with Vietnamese accentuated characters. This is because we are doing a global raw string conversion, converting also strings using fonts other than .vn*. The final tool would have to take care of converting only those strings associated with some .vn* font. * Test file used: {{projects:test-tcvn.doc|test-tcvn.doc}} ---- About the HyphenationIssue