Fork me on GitHub
龙之舞ing的博客

根据乱码猜出编码

有时java出现乱码后,想知道编码前的字符集和编码后的字符集,这样可以快速调整编码集纠正乱码,但是同样是乱码怎么看出来他编码前和编码后到底是什么字符集呢,今天闲来无聊我就写了个demo,尝试了一下。代码和结果如下:

`String source = “中文测试”;
Charset gbkCharset = Charset.forName(“gbk”);
Charset utf8Charset = Charset.forName(“utf-8”);
Charset iso88591Charset = Charset.forName(“iso-8859-1”);

Charset defaultCharset = Charset.defaultCharset();

System.out.printf("defaultCharset:%s%n", defaultCharset);

System.out.println(StringUtils.repeat("=",20));

String str1 = StringUtils.toEncodedString(source.getBytes(gbkCharset), utf8Charset);
System.out.printf("gbk=>utf-8:%s%n", str1);

String str4 = StringUtils.toEncodedString(source.getBytes(utf8Charset), gbkCharset);
System.out.printf("utf-8=>gbk:%s%n", str4);

String str2 = StringUtils.toEncodedString(source.getBytes(iso88591Charset), utf8Charset);
System.out.printf("iso8859-1=>utf-8:%s%n", str2);

String str5 = StringUtils.toEncodedString(source.getBytes(utf8Charset), iso88591Charset);
System.out.printf("utf-8=>iso8859-1:%s%n", str5);

String str3 = StringUtils.toEncodedString(source.getBytes(gbkCharset), iso88591Charset);
System.out.printf("gbk=>iso8859-1:%s%n", str3);

String str6 = StringUtils.toEncodedString(source.getBytes(iso88591Charset), gbkCharset);
System.out.printf("iso8859-1=>gbk:%s%n", str6);`

运行结果:

defaultCharset:UTF-8
`=========================
gbk=>utf-8:���IJ���
utf-8=>gbk:涓枃娴嬭瘯
iso8859-1=>utf-8:????
utf-8=>iso8859-1:中文测试
gbk=>iso8859-1:ÖÐÎIJâÊÔ
iso8859-1=>gbk:????

我是用idea写的demo,项目代码文件默认编码是utf-8。这样是不是可以根据乱码的字符,大致判断出编码前和编码后的字符集从而调整相应的编码呢?

声明:

我没有系统的写单元测试,也不知道这个方法靠不靠谱,仅供参考