关于python源码字符编码的定义
运行如下Python打印语句:
print u'I "said" do not touch “this.""'
其中包含一个中文的双引号,python解释器报错。报错信息如下:
[wangy@bogon 文档]$ python ex1.py
File "ex1.py", line 7
SyntaxError: Non-ASCII character '\xe2' in file ex1.py on line 7, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details
查看链接 http://www.python.org/peps/pep-0263.html
主要内容如下:
在Python2.1版本中,源码文件仅仅支持Latin-1,西欧国家的字符编码,从而给亚洲的编程爱
好者造成很大的困扰,必须使用“unicode-escape”编码来表示Unicode literals。
解决的方法就是为了让解释器了解源代码的编码,必须对源码文件的编码进行声明。
定义编码的方式:
Python will default to ASCII as standard encoding if no other encoding hints are given.
To define a source code encoding, a magic comment must be placed into the source
files either as first or second line in the file, such as:
# coding=
or (using formats recognized by popular editors):
#!/usr/bin/python
# -*- coding: -*-
or:
#!/usr/bin/python
# vim: set fileencoding= :
最好使用第一种或者第二种。
文中特别提到在windows平台下,增加Unicode BOM标记在Unicode文件头,因此不需要特别声明文件编码,同理也会在UTF-8文件头增加UTF-8标记,故亦不需要声明。
如果源文件使用 both the UTF-8 BOM mark signature and a magic encoding comment, the only allowed encoding for the comment is 'utf-8'. Any other encoding will cause an
error.
Examples
These are some examples to clarify the different styles for defining the source code encoding at the top of a Python source file:
With interpreter binary and using Emacs style file encoding comment:
#!/usr/bin/python
# -*- coding: latin-1 -*-
import os, sys
...
#!/usr/bin/python
# -*- coding: iso-8859-15 -*-
import os, sys
...
#!/usr/bin/python
# -*- coding: ascii -*-
import os, sys
...
Without interpreter line, using plain text:
# This Python file uses the following encoding: utf-8
import os, sys
...
Text editors might have different ways of defining the file's encoding, e.g.:
#!/usr/local/bin/python
# coding: latin-1
import os, sys
...
Without encoding comment, Python's parser will assume ASCII text:
#!/usr/local/bin/python
import os, sys
...
Encoding comments which don't work:
Missing "coding:" prefix:
#!/usr/local/bin/python
# latin-1
import os, sys
...
Encoding comment not on line 1 or 2:
#!/usr/local/bin/python
#
# -*- coding: latin-1 -*-
import os, sys
...
Unsupported encoding:
#!/usr/local/bin/python
# -*- coding: utf-42 -*-
import os, sys
...
修改源代码,以UTF-8保存,编辑器使用了Linux下的gedit
# -*- coding: utf-8 -*-
print "hello world!"
print "hello Again"
print "I like trying this"
print "This is fun"
print 'Yay! Printing'
print "I'd much rather you 'not'."
print u'I "said" 这里有中文双引号 “this.""'
正常打印