background preloader

Java

Facebook Twitter

What is Unicode? Unicode at the Windows command prompt (C++; .Net; Java) Strange things can happen when working with characters. It is important to understand why problems occur and what can be done about them. This post is about getting Unicode to work at the Windows command prompt (cmd.exe). Topics: This article requires your browser to be able to display Unicode characters. E.g. я == я - if you see a question mark there instead of a Cyrillic grapheme ( ), some of this article may not make as much sense. "Penny wise and pound foolish" - a character corruption example Lets look at the pound symbol (£ - the currency symbol) on Windows XP configured with British English regional settings.

If we save this data using Notepad and dump it at the console, the pound symbol is not printed. C:\demo>TYPE plaintext.txt abcú If we copy the file to another machine (Ubuntu 8.10, British regional settings) and dump it to a console, we just get an error question mark symbol. ~$ cat plaintext.txt abc? Character encodings and code pages a b c £ 61 62 63 A3 Unicode End notes. A rough guide to character encoding. It can be tricky figuring out the difference between character handling code that works and code that just appears to work because testing did not encounter cases that exposed bugs. This is a post about some of the pitfalls of character handling in Java. Topics: I wrote a little bit about Unicode before. This post might be exhausting, but it isn't exhaustive. Unicode in source files Java source files include support for Unicode. One choice is to encode the source files as Unicode, write the characters and inform the compiler at compile time. javac provides the -encoding <encoding> option for this.

Code saved as UTF-8, as might be written on an Ubuntu machine: public class PrintCopyright { public static void main(String[] args) { System.out.println("© Acme, Inc. "); }} 1. Javac -encoding UTF-8 PrintCopyright.java 2. Javac -encoding Cp1252 PrintCopyright.java These compiler settings will produce different outputs; only the first one is correct. Unicode and Java data types 1. 2. 3. 4. Encodings Notes. Java Tutorial: Converting Charset Encoding. Question: How can i use Java to convert a file encoded in gb18030 to utf-16?

Here's the solution: import java.io.File; import java.io.Reader; import java.io.Writer; import java.io.IOException; import java.io.InputStreamReader; import java.io.OutputStreamWriter; import java.io.FileInputStream; import java.io.FileOutputStream; public class charrs { public static void main(String[] args) throws IOException { File infile = new File("/Users/t/test/gb18030.txt"); File outfile = new File("/Users/t/test/utf16.txt"); Reader in = new InputStreamReader(new FileInputStream(infile), "GB18030"); Writer out = new OutputStreamWriter(new FileOutputStream(outfile), "UTF-16"); int c; while ((c = in.read()) ! = -1){ out.write(c);} in.close(); out.close(); } } Note that 3 levels of classes are involved: File, FileInputSream, InputStreamReader. For the same task done with Python, Perl, Emacs, see: Python & Perl: Converting a File's Encoding.

FAQ Java. La machine virtuelle Java (et plus précisément le Garbage Collector) s'occupe de libérer proprement la mémoire lorsque les objets ne sont plus utilisés. Toutefois, il existe un certain nombre de ressources qui doivent être libéré explicitement, comme par exemple les fichiers, les sockets ou les connections JDBC, car ils utilisent des ressources systèmes qui ne peuvent pas être gérées par le Garbage Collector... Ces ressources doivent être "libérées" explicitement grâce à une méthode spécifique (généralement nommé close()). Or on ne peut pas se contenter d'appeler cette méthode à la fin du traitement, car il y a un certain nombre de cas où cette méthode ne serait pas appelée (par exemple en cas d'exception ou de retour de la méthode). La solution la plus propre à mettre en oeuvre est d'utiliser un bloc try/finally qui respecte la structure suivante : Cette organisation en trois étapes permet donc de garantir la libération propre des ressources dans tous les cas.

Ma String est-elle compatible Latin1 ?

Scala