Encoding

Navajo: A Text Encoding Converter for Mac OS X

Navajo:

Navajo is a GUI front-end to the shell command “piconv”, in other words a tool for those who like mouse-clicks.

How to use it: open it, choose the input file, make up a name for the output file, choose current and desired text encoding… and bang, it’s done.

Why piconv? Because it could have been iconv as well, but it failed once and we’re not friends anymore.

I’ve never used piconv. I’ll learn more about the command and this new app. There doesn’t seem an option to convert every file in a folder (i.e., recursively).

A Simple AppleScript to Change the Encoding with CotEditor

I have tons of sample codes written in Shift-JIS with line endings of CR/LF (Windows). Below is an AppleScript to convert such documents to UTF-* encoding with LF (Unix). I use CotEditor because its encoding detection works better (for me) than BBEdit.

tell application "CotEditor"
    activate
    set properties of document 1 to {line ending:LF}
    convert document 1 to "Unicode (UTF-8)" with lossy
    -- Use regular expressions to replace yen symbol with backslash
    replace document 1 for "\x{00a5}" to "\\" with RE and all
end tell

Garbage Text in Flash in the Browsers

If you use a langauge other than English as your primary langauge in Mac OS X, you may encounter seeing garbage text in Flash embeded in browsers like Safari and Firefox.

This may seem to be caused by a hidden file called .CFUserTextEncoding located in your home directory. Because the file name starts with a .(dot), this can’t be seen in the Finder. You need to open the file and change the content from 0:0 to 1:14.

If you are not comfortable with using Terminal, run the following code in Terminal.

echo 1:14 >~/.CFUserTextEncoding

Enter the code above and press return.

You may need to re-login or restart your Mac to see the effect. This solution may be effective for other apps like OpenOffice.org.

Another Way to Convert Encodings in TextMate

Yebisu Blog (in Japanese) shows a following snippet.

#!/usr/bin/env ruby
require "nkf"
print NKF.nkf('-Ew -m0 -x', open(ENV['TM_FILEPATH']).read)

You need to have nkf installed to use this command. The snippet above converts EUC into UTF-8.

My problem, where I see a diamond-like character at the end of each line (at least when I try to convert Shift-JIS to UTF-8, still exists. I can still use “Remove Unprintable Characters in Document / Selection” in Text Bundle. Does anybody know how this removal affects the file?

Handling Japanese Encodings in TextMate

As we know, TextMate is fairly understandably biased to UTF. I often encounter garbage text when opening files someone else made. Most of them are encoded in Shift-JIS.

After playing with Terminal, I could use the following command to convert the file to be encoded in UTF-8.

iconv -f shift-jis -t utf-8 text1.txt>text2.txt

I tried to convert the command above to a command in TextMate. I made a new command called “Encode Shift-JIS to UTF-8”, and I typed in the following in the text field:

iconv -f shift-jis -t utf-8 "$TM_FILEPATH"

Also other settings are the following:

Save: Nothing
Input: None
Output: Replace Document

However, the command didn’t work appropriately. I saw a diamond-like character at the end of each line. (Update: The diamond characters can be removed with “Remove Unprintable Characters in Document / Selection” in Text Bundle.) So, I went and asked why this doesn’t work in TextMate mailing list. According to Hans, this problem is due to the window/buffer using UTF-8. He suggested an alternative way to this. You type the following command in a TextMate document window.

iconv -f shift-jis -t utf-8 /PATH/TO/file.txt | cat

Then, you type Control-R, which is “Execute Line and Insert the Result”. He also shared his own bundle to open encoded file, “OpenEncodedFile”, which is available from his website. Another bundle he shared in the list is a bundle to facilitate inserting file path names, called FileName Completion.tmCommand.

I think his way is better than use of nkf, which you need to install. iconv and textutil come with every Mac.

Note: I will put a link to the mailing list archive as soon as the thread is available in the archive.

Double Byte Characters and MarsEdit Bookmarklet

Brought to you by earnmydegree.com: Interested in coding? Get an online programming degree! It’s easier than ever to learn online, and there are tons of online universities to choose from! So look into learning more online today and earn your bachelors degree tomorrow!

Another bug for Daniel.

When title has double byte characters like Japanese on a web page. The bookmarklet “Post with MarsEdit” creates a post with garbage text (or I’m assuming that these are entities of Unicode.

MarsEdit Japanese Title

The result of pushing the bookmarklet will be like:

MarsEdit Unicode Garbage Thumnail

Syndicate content