So… I study Japanese. I was searching for a way to convert some EUC-JP encoded files to the UTF-8 (now standard in most OSes), and I found myself stuck with no tools to do so.
Searching the web, I’ve found many language functions to convert lines, strings or chunk of bytes from one encoding to the other, but I have in my hand a plain-text file (a rather large one).
I’ve used the Unicode Mapping CodePage 936 available from the unicode page to do the trick. Not all characters are there, but it mostly works. Since I’m quite in a hurry, this one will do for now. I’ll later try to encode it with BestFit 936 to see what I get (note: The current code does not work out-of-the-box with the BestFit file format. Some tweaking is needed).
So, having in hands the codepage, I figured I could make a small python script to convert my file to another encoding (UTF-8, in this case) and I’m sharing it with everyone (GPL License).
If this code is useful for you, you could donate (any amount) to show your appreciation from the link below.
It was coded on a Mac, using Python 3.1, but it also works on Windows (I’ve tested it on a XP). Note that it’s quite crude and you need to alter some settings at the beggining of the file (Source file and Destination file at least).

3 Comments
I have some old flashcard files from when I used Kingkanji on my PDA. All those files are saved as EUC-JP text files, and I would like to try and write a script that can change them to use with Anki. I’ll give it a try and let you know how it goes.
Hi, I needed to convert some EUC-JP encoded text to UTF-8 and google led me here. I had some trouble with the code page 936 though. For example the kanji for “new” (ie. “shin”, or “atarashii”), should be EUC-JP code BFB7, which is E696B0 in UTF-8. The second column in the cp936 seems to be UTF-16. “Shin” is 65B0 in UTF-16, which translates to the first column as D0C2. I think that is the GB2312 Chinese code set. Also the Wikipedia page on Code page 936 says its for Chinese (?)
BUT, I’m a newb here so I’m just googling around trying to figure it out.
I did find a nice solution though. “iconv” in UNIX or this Windows version:
http://gnuwin32.sourceforge.net/packages/libiconv.htm
Thanks.
@Greg
Hi greg!
Are you using Python 3.1? If you are, there is a much (much!) easier way to convert EUC-JP to UTF-8 characters from a file.
Take the EDict file (http://www.csse.monash.edu.au/~jwb/edict.html) for example. It’s encoded in EUC-JP, and you can convert it with a few simple lines:
fp = open("edict", "rb")
for row in fp:
print(row.decode("EUC-JP"))
fp.close()
Remember to decompress the edict file first.
Hope it helps!