VMware Cloud Community
heikki_hyperic
Contributor
Contributor

Non-ASCII characters not handled correctly for localized HQU plugins

The HQU plugin localization document (http://support.hyperic.com/confluence/display/hypcomm/HQU+Localization) is pretty brief, and there is just a single example of localization in the svn trunk about this (the gconsole plugin). Unfortunately for me, the German localization contains only one string, and all the characters are in the ASCII range.

I tried to localize my plugin to Finnish. Finnish alphabet has a 3 non-ASCII characters, the most commonly used being ä and ö and their uppercase versions. I read somewhere that the properties file needs to be in Latin1 (ISO-8859-1), so I made sure of that and wrote the letters like normal. To experiment, I also put in some letters as \u00e4 which was what native2ascii outputs for one of those special letters.

When I set the preferred language to Finnish in Firefox (Edit > Preferences... > Advanced > Languages), I see that my plugin did pick up the Finnish translation, but instead of the expected a and o with dots above I see stuff like "ä".

Is localization of HQU plugins supported yet? What else do I need to support non-ASCII characters?

In my specific case I could probably work around the issue by using the HTML entities for the few letters, but that won't really be an option for many other languages.
0 Kudos
5 Replies
admin
Immortal
Immortal

Hello Heikki,

To be honest, I don't have a lot of experience with localization, and
I'm the author of that code. The localization is tested, but as you
note, only with ascii chars.

I'm not sure where the breakdown of translation is. Is there an HTTP
header which must be set for your locale? If you print the text to
stdout or a file from within a controller (just use
localeBundle.prop) does it contain the right data?

-- Jon



On Jun 16, 2008, at 5:32 PM, Heikki Toivonen wrote:

> The HQU plugin localization document (http://support.hyperic.com/
> confluence/display/hypcomm/HQU+Localization) is pretty brief, and
> there is just a single example of localization in the svn trunk
> about this (the gconsole plugin). Unfortunately for me, the German
> localization contains only one string, and all the characters are
> in the ASCII range.
>
> I tried to localize my plugin to Finnish. Finnish alphabet has a 3
> non-ASCII characters, the most commonly used being ä and
> ö and their uppercase versions. I read somewhere that the
> properties file needs to be in Latin1 (ISO-8859-1), so I made sure
> of that and wrote the letters like normal. To experiment, I also
> put in some letters as \u00e4 which was what native2ascii outputs
> for one of those special letters.
>
> When I set the preferred language to Finnish in Firefox (Edit >
> Preferences... > Advanced > Languages), I see that my plugin did
> pick up the Finnish translation, but instead of the expected a and
> o with dots above I see stuff like "ä".
>
> Is localization of HQU plugins supported yet? What else do I need
> to support non-ASCII characters?
>
> In my specific case I could probably work around the issue by using
> the HTML entities for the few letters, but that won't really be an
> option for many other languages.


0 Kudos
heikki_hyperic
Contributor
Contributor

Hmm, I am not an expert either but I have done some stuff, although not with Java or Groovy. I'll see if I can spend some time on this.

It seems like Hyperic serves the plugin page with header: Content-Type: text/html;charset=UTF-8. AFAIK the .properties file must be in Latin1 encoding, but because you can use the \uXXXX format for characters it can contain almost every character possible. At this point I don't know what encoding you get the strings out of it in code, or where that happens, but I bet you would need to re-encode them for whatever encoding you use for the page you are serving. UTF-8 can be a universal encoding, but it is more common to use different encoding for different languages. For example, if you go to baidu.cn you will note that it serves the page with content type "text/html;charset=gb2312".

When I do log.info localeBundle.someKeyWithNonAscii, I do see my non-ASCII characters correctly in the log.

So I think the course of action is now to a) determine what encoding the strings come out of the properties file, and b) make sure that they get re-encoded to whatever the page is served with (starting by hardcoding the strings to UTF-8).
0 Kudos
heikki_hyperic
Contributor
Contributor

Hmm, some progress/clarifications. My terminal where I am watching the server log is set to UTF-8, i.e. the same as the web page is served as. If I print a line in the log as follows, I get the same erroneous output as happens on the web page (without these tricks):

log.info new String(localeBundle.someKeyWithNonAscii.getBytes(), "Latin1")
0 Kudos
heikki_hyperic
Contributor
Contributor

If I display my HQU plugin "standalone" (i.e. go to url like this: http://localhost:7080/hqu/cool/freezer/index.hqu), I can fix the encoding by setting contentType:"text/html; charset=UTF-8" parameter for render().

This still does not fix the case where the plugin content is displayed normally (between the Hyperic header and footer parts). It seems like some default is still setting the encoding for that case to Latin1, but I haven't found yet where that happens.

Of course hardcoding to UTF-8 is not the real solution either; there should be a way to see what the encoding is for the wrapping page and use that. Needless to say, I haven't found where that happens either.
0 Kudos
jtravis_hyperic
Hot Shot
Hot Shot

Heikki,

Yes, I'm not sure what dictates the encoding of the wrapping page. My guess is that there is a tomcat configuration which takes care of the default encoding. In the case of HQU, we're just using the Tomcat ServletResponse object which has already been setup for the request.
0 Kudos