数据文件格式:
1 2 3 4 5 | Aeschylus time as he grows old teaches many lessons Alexander Graham Bell Mr.Watson,come here.i want you! Benjamin Franklin it is hard for an empty bag to stand upright Benjamin Franklin little strokes fell great oaks --字段之间是tab,其它是空格 |
导入命令:
1 2 3 4 | drop table aa; create table aa(a varchar (40),tt text); load data local infile 'a.txt' into table aa; select * from aa; |
过程:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | root@localhost[lhrdb]> drop table aa; Query OK, 0 rows affected (0.17 sec) root@localhost[lhrdb]> create table aa(a varchar (40),tt text); Query OK, 0 rows affected (0.30 sec) root@localhost[lhrdb]> load data local infile 'a.txt' into table aa; Query OK, 4 rows affected (0.07 sec) Records: 4 Deleted: 0 Skipped: 0 Warnings: 0 root@localhost[lhrdb]> select * from aa; + -----------------------+----------------------------------------------+ | a | tt | + -----------------------+----------------------------------------------+ | Aeschylus | time as he grows old teaches many lessons | | Alexander Graham Bell | Mr.Watson,come here.i want you! | | Benjamin Franklin | it is hard for an empty bag to stand upright | | Benjamin Franklin | little strokes fell great oaks | + -----------------------+----------------------------------------------+ 4 rows in set (0.00 sec) |
帮助:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 | root@localhost[lhrdb]> help load data Name : 'LOAD DATA' Description: Syntax: LOAD DATA [LOW_PRIORITY | CONCURRENT] [ LOCAL ] INFILE 'file_name' [ REPLACE | IGNORE ] INTO TABLE tbl_name [PARTITION (partition_name,...)] [ CHARACTER SET charset_name] [{FIELDS | COLUMNS} [TERMINATED BY 'string' ] [[OPTIONALLY] ENCLOSED BY 'char' ] [ESCAPED BY 'char' ] ] [LINES [STARTING BY 'string' ] [TERMINATED BY 'string' ] ] [ IGNORE number {LINES | ROWS }] [(col_name_or_user_var,...)] [ SET col_name = expr,...] The LOAD DATA INFILE statement reads rows from a text file into a table at a very high speed. LOAD DATA INFILE is the complement of SELECT ... INTO OUTFILE. (See from a table to a file, use SELECT ... INTO OUTFILE. To read the file back into a table , use LOAD DATA INFILE. The syntax of the FIELDS and LINES clauses is the same for both statements. Both clauses are optional, but FIELDS must precede LINES if both are specified. You can also load data files by using the mysqlimport utility; it operates by sending a LOAD DATA INFILE statement to the server. The --local option causes mysqlimport to read data files from the client host. You can specify the --compress option to get better performance over slow networks if the client and server support the compressed protocol. See http://dev.mysql.com/doc/refman/5.7/en/mysqlimport.html. For more information about the efficiency of INSERT versus LOAD DATA INFILE and speeding up LOAD DATA INFILE, see http://dev.mysql.com/doc/refman/5.7/en/ insert -optimization.html. The file name must be given as a literal string. On Windows, specify backslashes in path names as forward slashes or doubled backslashes. The character_set_filesystem system variable controls the interpretation of the file name . LOAD DATA supports explicit partition selection using the PARTITION option with a comma-separated list of one or more names of partitions, subpartitions, or both. When this option is used, if any rows from the file cannot be inserted into any of the partitions or subpartitions named in the list, the statement fails with the error Found a row not matching the given partition set . For more information, see http://dev.mysql.com/doc/refman/5.7/en/partitioning-selection.html. For partitioned tables using storage engines that employ table locks, such as MyISAM, LOAD DATA cannot prune any partition locks. This does not apply to tables using storage engines which employ row- level locking, such as InnoDB. For more information, see http://dev.mysql.com/doc/refman/5.7/en/partitioning-limitations-locking .html. The server uses the character set indicated by the character_set_database system variable to interpret the information in the file. SET NAMES and the setting of character_set_client do not affect interpretation of input. If the contents of the input file use a character set that differs from the default , it is usually preferable to specify the character set of the file by using the CHARACTER SET clause. A character set of binary specifies "no conversion." LOAD DATA INFILE interprets all fields in the file as having the same character set , regardless of the data types of the columns into which field values are loaded. For proper interpretation of file contents, you must ensure that it was written with the correct character set . For example, if you write a data file with mysqldump -T or by issuing a SELECT ... INTO OUTFILE statement in mysql, be sure to use a --default-character-set option so that output is written in the character set to be used when the file is loaded with LOAD DATA INFILE. *Note*: It is not possible to load data files that use the ucs2, utf16, utf16le, or utf32 character set . If you use LOW_PRIORITY, execution of the LOAD DATA statement is delayed until no other clients are reading from the table . This affects only storage engines that use only table - level locking (such as MyISAM, MEMORY, and MERGE). If you specify CONCURRENT with a MyISAM table that satisfies the condition for concurrent inserts (that is , it contains no free blocks in the middle), other threads can retrieve data from the table while LOAD DATA is executing. This option affects the performance of LOAD DATA a bit , even if no other thread is using the table at the same time . With row-based replication, CONCURRENT is replicated regardless of MySQL version. With statement-based replication CONCURRENT is not replicated prior to MySQL 5.5.1 (see Bug #34628). For more information, see http://dev.mysql.com/doc/refman/5.7/en/replication-features- load -data.h tml. The LOCAL keyword affects expected location of the file and error handling, as described later. LOCAL works only if your server and your client both have been configured to permit it. For example, if mysqld was started with the local_infile system variable disabled, LOCAL does not work . See http://dev.mysql.com/doc/refman/5.7/en/ load -data- local .html. The LOCAL keyword affects where the file is expected to be found: o If LOCAL is specified, the file is read by the client program on the client host and sent to the server. The file can be given as a full path name to specify its exact location. If given as a relative path name , the name is interpreted relative to the directory in which the client program was started. When using LOCAL with LOAD DATA, a copy of the file is created in the server 's temporary directory. This is not the directory determined by the value of tmpdir or slave_load_tmpdir, but rather the operating system' s temporary directory, and is not configurable in the MySQL Server. (Typically the system temporary directory is /tmp on Linux systems and C:\WINDOWS\ TEMP on Windows.) Lack of sufficient space for the copy in this directory can cause the LOAD DATA LOCAL statement to fail. o If LOCAL is not specified, the file must be located on the server host and is read directly by the server. The server uses the following rules to locate the file: o If the file name is an absolute path name , the server uses it as given. o If the file name is a relative path name with one or more leading components, the server searches for the file relative to the server 's data directory. o If a file name with no leading components is given, the server looks for the file in the database directory of the default database. In the non-LOCAL case, these rules mean that a file named as ./myfile.txt is read from the server' s data directory, whereas the file named as myfile.txt is read from the database directory of the default database . For example, if db1 is the default database , the following LOAD DATA statement reads the file data.txt from the database directory for db1, even though the statement explicitly loads the file into a table in the db2 database : LOAD DATA INFILE 'data.txt' INTO TABLE db2.my_table; Non- LOCAL load operations read text files located on the server. For security reasons, such operations require that you have the FILE privilege. See http://dev.mysql.com/doc/refman/5.7/en/ privileges -provided.html. Also, non- LOCAL load operations are subject to the secure_file_priv system variable setting. If the variable value is a nonempty directory name , the file to be loaded must be located in that directory. If the variable value is empty (which is insecure), the file need only be readable by the server. Using LOCAL is a bit slower than letting the server access the files directly, because the contents of the file must be sent over the connection by the client to the server. On the other hand, you do not need the FILE privilege to load local files. LOCAL also affects error handling: o With LOAD DATA INFILE, data-interpretation and duplicate- key errors terminate the operation. o With LOAD DATA LOCAL INFILE, data-interpretation and duplicate- key errors become warnings and the operation continues because the server has no way to stop transmission of the file in the middle of the operation. For duplicate- key errors, this is the same as if IGNORE is specified. IGNORE is explained further later in this section . The REPLACE and IGNORE keywords control handling of input rows that duplicate existing rows on unique key values : o If you specify REPLACE , input rows replace existing rows . In other words, rows that have the same value for a primary key or unique index as an existing row. See [HELP REPLACE ]. o If you specify IGNORE , rows that duplicate an existing row on a unique key value are discarded. For more information, see http://dev.mysql.com/doc/refman/5.7/en/sql-mode.html# ignore -strict-co mparison. o If you do not specify either option , the behavior depends on whether the LOCAL keyword is specified. Without LOCAL , an error occurs when a duplicate key value is found, and the rest of the text file is ignored. With LOCAL , the default behavior is the same as if IGNORE is specified; this is because the server has no way to stop transmission of the file in the middle of the operation. URL: http://dev.mysql.com/doc/refman/5.7/en/ load -data.html |
官网:
https://dev.mysql.com/doc/refman/5.7/en/load-data.html
基本语法:
load data [low_priority] [local] infile 'file_name txt' [replace | ignore]
into table tbl_name
[fields
[terminated by't']
[OPTIONALLY] enclosed by '']
[escaped by'\' ]]
[lines terminated by'n']
[ignore number lines]
[(col_name, )]
load data infile 语句从一个文本文件中以很高的速度读入一个表中。 使用这个命令之前,mysqld进程(服务)必须已经在运行。 为了安全原因,当读取位于服务器上的 文本文件时,文件必须处于数据库目录或可被所有人读取。另外,为了对服务器上文件使用 load data infile ,在服务器主机上你必须有 file 的权限。
1 如果你指定关键词 low_priority , 那么MySQL将会等到没有其他人读这个表的时候,
才把插入数据。可以使用如下的命令:
load data low_priority infile "/home/mark/data sql" into table Orders;
2 如果指定 local 关键词,则表明从客户主机读文件。如果 local 没指定,文件必须位于服务器上。
3 replace 和 ignore 关键词控制对现有的唯一键记录的重复的处理。如果你指定 replace ,
新行将代替有相同的 唯一键值的现有行。如果你指定 ignore ,跳过有唯一键的现有行的
重复行的输入。如果你不指定任何一个选项,当找到重复键时,出现一个错误,
并且文本 文件的余下部分被忽略。例如:
load data low_priority infile "/home/mark/data sql" replace into table Orders;
4 分隔符
(1) fields关键字指定了文件记段的分割格式,如果用到这个关键字,MySQL剖析器希望
看到至少有下面的一个选项:
terminated by 分隔符:意思是以什么字符作为分隔符
enclosed by 字段括起字符
escaped by 转义字符
terminated by 描述字段的分隔符,默认 情况下是tab字符(\t)
enclosed by描述的是字段的括起字符。
escaped by 描述的转义字符。默认的是反斜杠 (backslash:\ )
例如:load data infile "/home/mark/Orders txt" replace into table Orders fields terminated by',' enclosed by '"';
(2)lines 关键字指定了每条记录的分隔符默认为 '\n' 即为换行符
如果两个字段都指定了 那 fields 必须在 lines 之前。 如果不指定 fields 关键字缺省值 与如果你这样
写的相同: fields terminated by'\t' enclosed by ’ '' ‘ escaped by'\\'
如果你不指定一个 lines 子句,缺省值与 如果你这样写的相同: lines terminated by'\n'
例如: load data infile "/jiaoben/load.txt" replace into table test fields terminated by ',' lines terminated by '/n';
5 load data infile 可以按指定的列把文件导入到数据库中。 当我们要把数据的一部分内容
导入的时候,,需要加入一些栏目(列/字段/field)到MySQL数据库中,以适应一些额外的
需要。比方说,我们要从 Access数据库升级到MySQL数据库的时候
下面的例子显示了如何向指 定的栏目(field)中导入数据:
load data infile "/home/Order txt" into table Orders(Order_Number, Order_Date, Customer_ID);
6 当在服务器主机上寻找文件时,服 务器使用下列规则:
( 1 )如果给出一个绝对路径名,服务器使用该路径名。
( 2 )如果给出一个有一个或多个前置部件的相对路径名,服务器相对服务器的数据目录
搜索文件。
( 3 )如果给出一个没有前置部件的一个 文件名,服务器在当前数据库的数据库目录寻找文件。
例如: /myfile txt” ="color: red">给出的文件是从服务器的数据目录读取,而作为 “myfile txt” 给出的一个文 件
是从当前数据库的数据库目录下读取。
注意:字段中的空值用 \N 表示
我的文章一般浅显易懂,不会搞那么深入让大家很难理解。(其实我水平也不咋样)
LOAD DATA INFILE 一直被认为是MySQL很强大的一个数据导入工具,因为他速度非常的快。
不过有几个问题一定要注意
1、编码。
2、灵活导入导出。
LOAD DATA [LOW_PRIORITY] [LOCAL] INFILE 'file_name.txt' [REPLACE | IGNORE]
INTO TABLE tbl_name
[FIELDS
[TERMINATED BY '\t']
[OPTIONALLY] ENCLOSED BY '']
[ESCAPED BY '\\' ]]
[LINES TERMINATED BY '\n']
[IGNORE number LINES]
[(col_name,...)]
LOAD DATA INFILE语句从一个文本文件中以很高的速度读入一个表中。如果指定LOCAL关键词,从客户主机读文件。如果LOCAL没指定,文件必须位于 服务器上。 (LOCAL在MySQL3.22.6或以后版本中可用。)
为了 安全原因,当读取位于服务器上的文本文件时,文件必须处于数据库目录或可被所有人读取。另外,为了对服务器上文件使用LOAD DATA INFILE, 在服务器主机上你必须有file的权限。见6.5 由MySQL提供的权限。
如果你指定关键词LOW_PRIORITY,LOAD DATA语句的执行被推迟到没有其他客户读取表后。
使用LOCAL将比让服务器直接存取文件慢些,因为文件的内容必须从客户主机传送到服务器主机。在另一方面,你不需要file权限装载本地文件。
你也可以使用mysqlimport实用程序装载数据文件;它由发送一个LOAD DATA INFILE命令到服务器来运作。 --local选项使得mysqlimport从客户主机上读取数据。如果客户和服务器支持压缩协议,你能指定--compress在较慢的 网络上获得更好的性能。
当在服务器主机上寻找文件时,服务器使用下列规则:
如果给出一个绝对路径名,服务器使用该路径名。
如果给出一个有一个或多个前置部件的相对路径名,服务器相对服务器的数据目录搜索文件。
如果给出一个没有前置部件的一个文件名,服务器在当前数据库的数据库目录寻找文件。
注意这些规则意味着一个像“./myfile.txt”给出的文件是从服务器的数据目录读取,而作为“myfile.txt”给出的一个文件是从当前数据库的数据库目录下读取。也要注意,对于下列哪些语句,对db1文件从数据库目录读取,而不是db2:
mysql> USE db1;
mysql> LOAD DATA INFILE "./data.txt" INTO TABLE db2.my_table;
REPLACE和IGNORE关键词控制对现有的唯一键记录的重复的处理。如果你指定REPLACE,新行将代替有相同的唯一键值的现有行。如果你指定IGNORE,跳过有唯一键的现有行的重复行的输入。如果你不指定任何一个选项,当找到重复键键时,出现一个错误,并且文本文件的余下部分被忽略时。
如果你使用LOCAL关键词从一个本地文件装载数据,服务器没有办法在操作的当中停止文件的传输,因此缺省的行为好像IGNORE被指定一样。
LOAD DATA INFILE是SELECT ... INTO OUTFILE的逆操作,
SELECT句法。为了将一个数据库的数据写入一个文件,使用SELECT ... INTO OUTFILE,为了将文件读回数据库,使用LOAD DATA INFILE。两个命令的FIELDS和LINES子句的语法是相同的。两个子句是可选的,但是如果指定两个,FIELDS必须在LINES之前。
如果你指定一个FIELDS子句,它的每一个子句(TERMINATED BY, [OPTIONALLY] ENCLOSED BY和ESCAPED BY)也是可选的,除了你必须至少指定他们之一。
如果你不指定一个FIELDS子句,缺省值与如果你这样写的相同:
FIELDS TERMINATED BY '\t' ENCLOSED BY '' ESCAPED BY '\\'
如果你不指定一个LINES子句,缺省值与如果你这样写的相同:
LINES TERMINATED BY '\n'
换句话说,缺省值导致读取输入时,LOAD DATA INFILE表现如下:
在换行符处寻找行边界
在定位符处将行分进字段
不要期望字段由任何引号字符封装
将由“\”开头的定位符、换行符或“\”解释是字段值的部分字面字符
相反,缺省值导致在写入输出时,SELECT ... INTO OUTFILE表现如下:
在字段之间写定位符
不用任何引号字符封装字段
使用“\”转义出现在字段中的定位符、换行符或“\”字符
在行尾处写换行符
注意,为了写入FIELDS ESCAPED BY '\\',对作为一条单个的反斜线被读取的值,你必须指定2条反斜线值。
IGNORE number LINES选项可被用来忽略在文件开始的一个列名字的头:
mysql> LOAD DATA INFILE "/tmp/file_name" into table test IGNORE 1 LINES;
当你与LOAD DATA INFILE一起使用SELECT ... INTO OUTFILE将一个数据库的数据写进一个文件并且随后马上将文件读回数据库时,两个命令的字段和处理选项必须匹配,否则,LOAD DATA INFILE将不能正确解释文件的内容。假定你使用SELECT ... INTO OUTFILE将由逗号分隔的字段写入一个文件:
mysql> SELECT * FROM table1 INTO OUTFILE 'data.txt'
FIELDS TERMINATED BY ','
FROM ...
为了将由逗号分隔的文件读回来,正确的语句将是:
mysql> LOAD DATA INFILE 'data.txt' INTO TABLE table2
FIELDS TERMINATED BY ',';
相反,如果你试图用下面显示的语句读取文件,它不会工作,因为它命令LOAD DATA INFILE在字段之间寻找定位符:
mysql> LOAD DATA INFILE 'data.txt' INTO TABLE table2
FIELDS TERMINATED BY '\t';
可能的结果是每个输入行将被解释为单个的字段。
LOAD DATA INFILE能被用来读取从外部来源获得的文件。例如,以dBASE格式的文件将有由逗号分隔并用双引号包围的字段。如果文件中的行由换行符终止,下面显示的命令说明你将用来装载文件的字段和行处理选项:
mysql> LOAD DATA INFILE 'data.txt' INTO TABLE tbl_name
FIELDS TERMINATED BY ',' ENCLOSED BY '"'
LINES TERMINATED BY '\n';
任何字段或行处理选项可以指定一个空字符串('')。如果不是空,FIELDS [OPTIONALLY] ENCLOSED BY和FIELDS ESCAPED BY值必须是一个单个字符。FIELDS TERMINATED BY和LINES TERMINATED BY值可以是超过一个字符。例如,写入由回车换行符对(CR+LF)终止的行,或读取包含这样行的一个文件,指定一个LINES TERMINATED BY '\r\n'子句。
FIELDS [OPTIONALLY] ENCLOSED BY控制字段的包围字符。对于输出(SELECT ... INTO OUTFILE),如果你省略OPTIONALLY,所有的字段由ENCLOSED BY字符包围。对于这样的输出的一个例子(使用一个逗号作为字段分隔符)显示在下面:
"1","a string","100.20"
"2","a string containing a , comma","102.20"
"3","a string containing a \" quote","102.20"
"4","a string containing a \", quote and comma","102.20"
如果你指定OPTIONALLY,ENCLOSED BY字符仅被用于包围CHAR和VARCHAR字段:
1,"a string",100.20
2,"a string containing a , comma",102.20
3,"a string containing a \" quote",102.20
4,"a string containing a \", quote and comma",102.20
注意,一个字段值中的ENCLOSED BY字符的出现通过用ESCAPED BY字符作为其前缀来转义。也要注意,如果你指定一个空ESCAPED BY值,可能产生不能被LOAD DATA INFILE正确读出的输出。例如,如果转义字符为空,上面显示的输出显示如下。注意到在第四行的第二个字段包含跟随引号的一个逗号,它(错误地)好象要终止字段:
1,"a string",100.20
2,"a string containing a , comma",102.20
3,"a string containing a " quote",102.20
4,"a string containing a ", quote and comma",102.20
对于输入,ENCLOSED BY字符如果存在,它从字段值的尾部被剥去。(不管是否指定OPTIONALLY都是这样;OPTIONALLY对于输入解释不起作用)由ENCLOSED BY字符领先的ESCAPED BY字符出现被解释为当前字段值的一部分。另外,出现在字段中重复的ENCLOSED BY被解释为单个ENCLOSED BY字符,如果字段本身以该字符开始。例如,如果ENCLOSED BY '"'被指定,引号如下处理:
"The ""BIG"" boss" -> The "BIG" boss
The "BIG" boss -> The "BIG" boss
The ""BIG"" boss -> The ""BIG"" boss
FIELDS ESCAPED BY控制如何写入或读出特殊字符。如果FIELDS ESCAPED BY字符不是空的,它被用于前缀在输出上的下列字符:
FIELDS ESCAPED BY字符
FIELDS [OPTIONALLY] ENCLOSED BY字符
FIELDS TERMINATED BY和LINES TERMINATED BY值的第一个字符
ASCII 0(实际上将后续转义字符写成 ASCII'0',而不是一个零值字节)
如果FIELDS ESCAPED BY字符是空的,没有字符被转义。指定一个空转义字符可能不是一个好主意,特别是如果在你数据中的字段值包含刚才给出的表中的任何字符。
对于输入,如果FIELDS ESCAPED BY字符不是空的,该字符的出现被剥去并且后续字符在字面上作为字段值的一个部分。例外是一个转义的“0”或“N”(即,\0或\N,如果转义字符是“\”)。这些序列被解释为ASCII 0(一个零值字节)和NULL。见下面关于NULL处理的规则。
对于更多关于“\”- 转义句法的信息,在某些情况下,字段和行处理选项相互作用:
如果LINES TERMINATED BY是一个空字符串并且FIELDS TERMINATED BY是非空的,行也用FIELDS TERMINATED BY终止。
如果FIELDS TERMINATED BY和FIELDS ENCLOSED BY值都是空的(''),一个固定行(非限定的)格式被使用。用固定行格式,在字段之间不使用分隔符。相反,列值只用列的“显示”宽度被写入和读出。例如,如果列被声明为INT(7),列的值使用7个字符的字段被写入。对于输入,列值通过读取7个字符获得。固定行格式也影响NULL值的处理;见下面。注意如果你正在使用一个多字节字符集,固定长度格式将不工作。
NULL值的处理有多种,取决于你使用的FIELDS和LINES选项:
对于缺省FIELDS和LINES值,对输出,NULL被写成\N,对输入,\N被作为NULL读入(假定ESCAPED BY字符是“\”)。
如果FIELDS ENCLOSED BY不是空的,包含以文字词的NULL作为它的值的字段作为一个NULL值被读入(这不同于包围在FIELDS ENCLOSED BY字符中的字NULL,它作为字符串'NULL'读入)。
如果FIELDS ESCAPED BY是空的,NULL作为字NULL被写入。
用固定行格式(它发生在FIELDS TERMINATED BY和FIELDS ENCLOSED BY都是空的时候),NULL作为一个空字符串被写入。注意,在写入文件时,这导致NULL和空字符串在表中不能区分,因为他们都作为空字符串被写入。如果在读回文件时需要能区分这两者,你应该不使用固定行格式。
一些不被LOAD DATA INFILE支持的情况:
固定长度的行(FIELDS TERMINATED BY和FIELDS ENCLOSED BY都为空)和BLOB或TEXT列。
如果你指定一个分隔符与另一个相同,或是另一个的前缀,LOAD DATA INFILE不能正确地解释输入。例如,下列FIELDS子句将导致问题:
FIELDS TERMINATED BY '"' ENCLOSED BY '"'
如果FIELDS ESCAPED BY是空的,一个包含跟随FIELDS TERMINATED BY值之后的FIELDS ENCLOSED BY或LINES TERMINATED BY的字段值将使得LOAD DATA INFILE过早地终止读取一个字段或行。这是因为LOAD DATA INFILE不能正确地决定字段或行值在哪儿结束。
下列例子装载所有persondata表的行:
mysql> LOAD DATA INFILE 'persondata.txt' INTO TABLE persondata;
没有指定字段表,所以LOAD DATA INFILE期望输入行对每个表列包含一个字段。使用缺省FIELDS和LINES值。
如果你希望仅仅装载一张表的某些列,指定一个字段表:
mysql> LOAD DATA INFILE 'persondata.txt'
INTO TABLE persondata (col1,col2,...);
如果在输入文件中的字段顺序不同于表中列的顺序,你也必须指定一个字段表。否则,MySQL不能知道如何匹配输入字段和表中的列。
如果一个行有很少的字段,对于不存在输入字段的列被设置为缺省值。
如果字段值缺省,空字段值有不同的解释:
对于字符串类型,列被设置为空字符串。
对于数字类型,列被设置为0。
对于日期和时间类型,列被设置为该类型的适当“零”值。
如果列有一个NULL,或(只对第一个TIMESTAMP列)在指定一个字段表时,如果TIMESTAMP列从字段表省掉,TIMESTAMP列只被设置为当前的日期和时间。
如果输入行有太多的字段,多余的字段被忽略并且警告数字加1。
LOAD DATA INFILE认为所有的输入是字符串,因此你不能像你能用INSERT语句的ENUM或SET列的方式使用数字值。所有的ENUM和SET值必须作为字符串被指定!
如果你正在使用C API,当LOAD DATA INFILE查询完成时,你可通过调用API函数mysql_info()得到有关查询的信息。信息字符串的格式显示在下面:
Records: 1 Deleted: 0 Skipped: 0 Warnings: 0
当值通过INSERT语句插入时,在某些情况下出现警告,除了在输入行中有太少或太多的字段时,LOAD DATA INFILE也产生警告。警告没被 存储在任何地方;警告数字仅能用于表明一切是否顺利。如果你得到警告并且想要确切知道你为什么得到他们,一个方法是使用SELECT ... INTO OUTFILE到另外一个文件并且把它与你的原版输入文件比较
load data语句详解
1、语法解析
1.1. 必选子句或关键字
1.2. 可选子句或关键字
1.2.1. LOW_PRIORITY关键字
1.2.2. LOCAL关键字
1.2.2.1. 使用与不使用local关键字的流程
1.2.2.2. 使用local关键字的错误处理
1.2.3. REPLACE与IGNORE关键字
1.2.4. PARTITION子句
1.2.5. CHARACTER SET charset_name子句
1.2.6. FIELDS(与COLUMNS关键字相同)和LINES子句
1.2.6.1. FIELDS关键字及其子句详解
1.2.6.2. LINES 关键字及其子句详解
1.2.6.3. FIELDS和LINES注意事项
1.2.7. IGNORE number {LINES | ROWS}子句
1.2.8. (col_name_or_user_var,…)指定字段名称的子句
1.2.8. SET col_name = expr,…子句
2、批量导出和批量导入
2.1. 使用mysqldump批量导出
2.2. 使用mysqimport批量导出
2.2.1. mysqlimport语法及其参数说明
2.2.2. mysqlimport用法演示示例
2.2.2.1. 单表导入
2.2.2.2.多表导入
3、总结
背景
数据库版本:MySQL 5.7.18
服务器信息:本地到处导入在10.10.30.241上演示,local远程导入在10.10.30.250上演示
数据库参数配置:
* 双一,secure_file_priv='',log-bin,binlog_format=row,隔离级别RC,sql_mode='ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION'
参考资料: https://dev.mysql.com/doc/refman/5.7/en/load-data.html
制造测试数据
admin@localhost : (none) 09:28:29> use xiaoboluo
Database changed
admin@localhost : xiaoboluo 09:28:31> show tables;
+---------------------+
| Tables_in_xiaoboluo |
+---------------------+
| test |
| test2 |
+---------------------+
2 rows in set (0.00 sec)
admin@localhost : xiaoboluo 09:36:07> create table test3(id int unsigned not null primary key auto_increment,test varchar(100),test2 varchar(100));
Query OK, 0 rows affected (0.01 sec)
admin@localhost : xiaoboluo 09:36:47> insert into test3(test,test2) values('a string','100.20'),('a string containing a , comma','102.20'),('a string containing a " quote','102.20'),\
('a string containing a ", quote and comma','102.20');
Query OK, 4 rows affected (0.00 sec)
Records: 4 Duplicates: 0 Warnings: 0
admin@localhost : xiaoboluo 09:40:33> select * from test3;
+----+------------------------------------------+--------+
| id | test | test2 |
+----+------------------------------------------+--------+
| 2 | a string | 100.20 |
| 4 | a string containing a , comma | 102.20 |
| 6 | a string containing a " quote | 102.20 |
| 8 | a string containing a ", quote and comma | 102.20 |
+----+------------------------------------------+--------+
4 rows in set (0.00 sec)
查看语法帮助信息
admin@localhost : xiaoboluo 10:14:21> help load data;
Name: 'LOAD DATA'
Description:
Syntax:
LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'file_name'
[REPLACE | IGNORE]
INTO TABLE tbl_name
[PARTITION (partition_name,...)]
[CHARACTER SET charset_name]
[{FIELDS | COLUMNS}
[TERMINATED BY 'string']
[[OPTIONALLY] ENCLOSED BY 'char']
[ESCAPED BY 'char']
]
[LINES
[STARTING BY 'string']
[TERMINATED BY 'string']
]
[IGNORE number {LINES | ROWS}]
[(col_name_or_user_var,...)]
[SET col_name = expr,...]
.....
load data语句加载的数据源可以是mysqldump导出的纯文本数据文件,也可以是使用SELECT … INTO OUTFILE '/path/xx.txt';语句生成的单表纯文本数据文件,或者其他的方式生成的txt(只要生成的纯文本数据列按指定分隔符分割的纯文本数据文件即可)
从上面的帮助信息可以看到整个load data语句的语法结构,其中load data infile 'file.txt' into table tb_name; 是最基本的使用语句结构,其余的都为可选子句
1.1. 必选子句或关键字
load data语句简单示例
如果文本文件中的数据字段与表结构中的字段定义顺序相同,则直接使用如下语句载入即可
# 执行select ...into outfile语句导出文本文件
admin@localhost : xiaoboluo 10:11:19> select * from test3;
+----+------------------------------------------+--------+
| id | test | test2 |
+----+------------------------------------------+--------+
| 2 | a string | 100.20 |
| 4 | a string containing a , comma | 102.20 |
| 6 | a string containing a " quote | 102.20 |
| 8 | a string containing a ", quote and comma | 102.20 |
+----+------------------------------------------+--------+
4 rows in set (0.00 sec)
admin@localhost : xiaoboluo 10:31:10> select * from test3 into outfile "/tmp/test3.txt";
Query OK, 4 rows affected (0.00 sec)
# 使用python查看一下数据中的特殊符号,其中\t是制表符,用于字段分割,\n是换行符,用于行分割
>>> f = open('/tmp/test3.txt','r')
>>> data = f.readlines()
>>> data
['2\ta string\t100.20\n', '4\ta string containing a , comma\t102.20\n', '6\ta string containing a " quote\t102.20\n', '8\ta string containing a ", quote and comma\t102.20\n']
>>> for i in data:
... print i,
...
2 a string 100.20
4 a string containing a , comma 102.20
6 a string containing a " quote 102.20
8 a string containing a ", quote and comma 102.20
# 现在,truncate掉表test3,执行load data载入数据
admin@localhost : xiaoboluo 10:31:43> truncate test3;
Query OK, 0 rows affected (0.01 sec)
admin@localhost : xiaoboluo 10:36:40> load data infile '/tmp/test3.txt' into table test3;
Query OK, 4 rows affected (0.00 sec)
Records: 4 Deleted: 0 Skipped: 0 Warnings: 0
admin@localhost : xiaoboluo 10:36:53> select * from test3; #这里可以看到,数据到处导入正常
+----+------------------------------------------+--------+
| id | test | test2 |
+----+------------------------------------------+--------+
| 2 | a string | 100.20 |
| 4 | a string containing a , comma | 102.20 |
| 6 | a string containing a " quote | 102.20 |
| 8 | a string containing a ", quote and comma | 102.20 |
+----+------------------------------------------+--------+
4 rows in set (0.00 sec)
admin@localhost : xiaoboluo 10:36:58>
如果文本文件中的数据字段与表结构中的字段定义顺序不同,则使用如下语句指定载入表中的字段顺序
# 导出文本,导出文本时不使用select *,而是使用具体的字段,把顺序稍微调整一下
admin@localhost : xiaoboluo 10:36:58> system rm -f /tmp/test3.txt;
admin@localhost : xiaoboluo 10:40:46> desc test3; # 留意表的字段定义顺序,这里是id, test, test2
+-------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| test | varchar(100) | YES | | NULL | |
| test2 | varchar(100) | YES | | NULL | |
+-------+------------------+------+-----+---------+----------------+
3 rows in set (0.00 sec)
admin@localhost : xiaoboluo 10:41:35> select * from test3; # 留个表中各个字段的值大概是什么内容
+----+------------------------------------------+--------+
| id | test | test2 |
+----+------------------------------------------+--------+
| 2 | a string | 100.20 |
| 4 | a string containing a , comma | 102.20 |
| 6 | a string containing a " quote | 102.20 |
| 8 | a string containing a ", quote and comma | 102.20 |
+----+------------------------------------------+--------+
4 rows in set (0.00 sec)
admin@localhost : xiaoboluo 10:40:17> select id,test2,test from test3 into outfile "/tmp/test3.txt";
Query OK, 4 rows affected (0.01 sec)
admin@localhost : xiaoboluo 10:40:41> system cat /tmp/test3.txt; #这里可以看到文本文件中的test字段值放到最后去了
2 100.20 a string
4 102.20 a string containing a , comma
6 102.20 a string containing a " quote
8 102.20 a string containing a ", quote and comma
# 现在,truncate掉表test3,执行load data载入数据
admin@localhost : xiaoboluo 10:47:31> truncate test3;
Query OK, 0 rows affected (0.01 sec)
admin@localhost : xiaoboluo 10:48:43> load data infile '/tmp/test3.txt' into table test3(id,test2,test);
Query OK, 4 rows affected (0.00 sec)
Records: 4 Deleted: 0 Skipped: 0 Warnings: 0
admin@localhost : xiaoboluo 10:49:13> select * from test3; #可以看到,使用(id,test2,test)子句指定了与文本文件中数据的字段一致的顺序,导入表表中之后数据的顺序是正确的
+----+------------------------------------------+--------+
| id | test | test2 |
+----+------------------------------------------+--------+
| 2 | a string | 100.20 |
| 4 | a string containing a , comma | 102.20 |
| 6 | a string containing a " quote | 102.20 |
| 8 | a string containing a ", quote and comma | 102.20 |
+----+------------------------------------------+--------+
4 rows in set (0.00 sec)
1.2. 可选子句或关键字
以下演示部分只针对部分子句或关键字做演示,并不是全部,悉知
1.2.1. LOW_PRIORITY关键字
如果load data语句使用了LOW_PRIORITY关键字,则在碰到其他会话操作相同表时,则会延迟执行LOAD DATA语句,直到其他会话操作表结束为止。这仅影响使用表级锁定的存储引擎(如MyISAM,MEMORY和MERGE),对于innodb存储引擎不起作用,因为innodb引擎是行级锁,对于load data新插入的不同的数据行之间的操作不会发生冲突。本小节不做演示,更多信息参考链接:https://dev.mysql.com/doc/refman/5.7/en/load-data.html
1.2.2. LOCAL关键字
1.2.2.1. 使用与不使用local关键字的流程
如果要载入的文本文件不在mysql server数据库本身的本地磁盘,客户端也不是从mysql server本机登录的,则需要使用local关键字,指定mysql server从client host本地加载该文件,需要mysql server端使用local_infile=true(或者设置为1,不设置时默认为1)启动,以及客户端连接mysql server时也使用local_infile=true(或者设置为1,不指定时默认为1)连接才能使用,server和client必须都开启这个参数才能使用local关键字,任意一个关闭都不能使用
# 登录到数据库,重新导出表数据到文本,并发送到10.10.30.250服务器
admin@localhost : xiaoboluo 10:51:57> system rm -f /tmp/test3.txt;
admin@localhost : xiaoboluo 11:10:19> select * from test3 into outfile "/tmp/test3.txt";
Query OK, 4 rows affected (0.01 sec)
admin@localhost : xiaoboluo 11:10:41> system scp /tmp/test3.txt 10.10.30.250:/tmp/
test3.txt 100% 146 0.1KB/s 00:00
admin@localhost : xiaoboluo 11:11:15>
# 登录到10.10.30.250服务器,远程连接10.10.30.241数据库
[root@e710d318-d5b4-4bc7-a606-d09f06ff5f5d ~]# mysql -uadmin -pletsg0 -h20.10.30.241
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 46
Server version: 5.7.18-log MySQL Community Server (GPL)
Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> use xiaoboluo
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> system ls -lh /tmp/test3.txt;
-rw-r--r-- 1 root root 146 May 3 11:11 /tmp/test3.txt
mysql> system cat /tmp/test3.txt;
2 a string 100.20
4 a string containing a , comma 102.20
6 a string containing a " quote 102.20
8 a string containing a ", quote and comma 102.20
mysql> show variables like '%local%';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| local_infile | ON |
+---------------+-------+
1 row in set (0.00 sec)
mysql> set global local_infile=OFF; #关闭server端的local_infile参数
Query OK, 0 rows affected (0.00 sec)
mysql> truncate test3;
Query OK, 0 rows affected (0.00 sec)
mysql> load data local infile '/tmp/test3.txt' into table test3; #执行导入数据时报错了
ERROR 1148 (42000): The used command is not allowed with this MySQL version
mysql> set global local_infile=ON; #重新打开server端的local_infile参数
Query OK, 0 rows affected (0.00 sec)
mysql> load data local infile '/tmp/test3.txt' into table test3; #导入成功
Query OK, 4 rows affected (0.00 sec)
Records: 4 Deleted: 0 Skipped: 0 Warnings: 0
mysql> select * from test3; #查看数据,可以看到数据已成功导入表
+----+------------------------------------------+--------+
| id | test | test2 |
+----+------------------------------------------+--------+
| 2 | a string | 100.20 |
| 4 | a string containing a , comma | 102.20 |
| 6 | a string containing a " quote | 102.20 |
| 8 | a string containing a ", quote and comma | 102.20 |
+----+------------------------------------------+--------+
4 rows in set (0.00 sec)
# 对于客户端连接server时使用local_infile=0参数,在执行导入数据时也会报相同的错误,这里不做演示,示例命令:mysql -uadmin -pletsg0 -h20.10.30.241 --local-infile=0
使用local关键字与不使用local关键字时load data语句加载文本文件的流程
如果指定了LOCAL,则该客户端程序在客户端主机上读取load data语句需要的文件并将其发送到服务器。该文件可以使用完整路径名称来指定其位置。也可以使用相对路径,使用相对路径时,路径前缀为使用客户端程序时的工作目录,当使用带有LOAD DATA的LOCAL关键字时,会在mysql server端的临时目录中创建该文件的副本(注意:这个场景下该文件的路径不是由tmpdir或slave_load_tmpdir的值指定的,而是操作系统的临时目录/tmp,并且在MySQL server中这个存放副本的路径是不可配置的(通常,系统临时目录是Linux系统上的/tmp,Windows上的C:\WINDOWS\TEMP),要注意,如果在该临时目录下创建load文件的副本时发现磁盘空间不够,会导致LOAD DATA LOCAL语句执行失败。在客户端主机读取文本文件定位规则是如果指定的是绝对路径,则使用绝对路径,如果是相对路径,则在登录mysql server时的工作路径下查找,找不到就报错:ERROR 2 (HY000): File 'test3.txt' not found (Errcode: 2 - No such file or directory)
如果未指定LOCAL,则该文件必须位于mysql server端主机上,并由服务器直接读取。服务器使用以下规则来定位文件:
* 如果文件名是绝对路径名, mysql server将按照给定的路径读取文件
* 如果文件名是一个相对路径名,则mysql server将在server端的datadir下搜索该文件,如果load data语句指定了库名,则在datadir的指定库名下搜索文本文件,如果没有指定库名,则在默认数据库下搜索文本文件(load data语句没有指定库名时要正确执行语句必须先使用use db语句切库,so,这个就是默认库)
* 如果datadir下还找不到就报错:ERROR 13 (HY000): Can't get stat of '/datadir/xiaoboluo/test3.txt' (Errcode: 2 - No such file or directory)
PS:
非LOCAL方式只会加载位于mysql server上的文本文件。出于安全考虑,此类操作要求您具有FILE特权。而且,非本地加载操作也受到secure_file_priv系统变量的设置。如果变量值是非空目录名称,则要加载的文件必须位于该变量指定的目录中。如果变量值为空(这是不安全的,这个时候server本地导入由file权限控制,client远程主机的文件导入由server和client端的local_infile选项共同控制),则该文件只能由服务器读取。
使用LOCAL比让服务器直接访问文件要慢一些,因为文件的内容必须通过客户端的连接发送到服务器。另一方面,您不需要FILE权限来加载本地文件。可以使用local子句
使用local关键字时碰到唯一键值冲突时的处理方式与IGNORE关键字相同,忽略冲突的行
1.2.2.2. 使用local关键字的错误处理
使用LOCAL会影响错误处理行为:
使用LOAD DATA INFILE,数据解析碰到重复键时默认情况下会终止操作
admin@localhost : xiaoboluo 11:46:09> select * from test3 into outfile '/tmp/test3.txt';
Query OK, 4 rows affected (0.00 sec)
admin@localhost : xiaoboluo 01:40:32> desc test3; #留意id是主键
+-------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| test | varchar(100) | YES | | NULL | |
| test2 | varchar(100) | YES | | NULL | |
+-------+------------------+------+-----+---------+----------------+
3 rows in set (0.00 sec)
admin@localhost : xiaoboluo 01:40:51> select * from test3; # 留意id列值是2,4,6,8
+----+------------------------------------------+--------+
| id | test | test2 |
+----+------------------------------------------+--------+
| 2 | a string | 100.20 |
| 4 | a string containing a , comma | 102.20 |
| 6 | a string containing a " quote | 102.20 |
| 8 | a string containing a ", quote and comma | 102.20 |
+----+------------------------------------------+--------+
4 rows in set (0.00 sec)
admin@localhost : xiaoboluo 01:40:57> system cat /tmp/test3.txt; #留意导出的文本中的第一列数值与表中的id列值相同
2 a string 100.20
4 a string containing a , comma 102.20
6 a string containing a " quote 102.20
8 a string containing a ", quote and comma 102.20
admin@localhost : xiaoboluo 01:41:21> load data infile '/tmp/test3.txt' into table test3; #执行load data时,不使用local关键字直接报主键冲突
ERROR 1062 (23000): Duplicate entry '2' for key 'PRIMARY'
admin@localhost : xiaoboluo 01:41:32> show warnings;
+-------+------+---------------------------------------+
| Level | Code | Message |
+-------+------+---------------------------------------+
| Error | 1062 | Duplicate entry '2' for key 'PRIMARY' |
+-------+------+---------------------------------------+
1 row in set (0.00 sec)
使用LOAD DATA LOCAL INFILE,数据解析碰到重复键时将发出警告,并且操作继续进行,因为server端无法在客户端操作期间停止传输文件。此时处理重复键与指定IGNORE关键字相同(即忽略主键冲突的数据行)
admin@localhost : xiaoboluo 01:46:52> load data local infile '/tmp/test3.txt' into table test3; #加上local关键字再次执行,可以发现不报错了,但是Skipped: 4 Warnings: 4表示跳过了4行数据导入,\
注意:如果不使用local关键字,在默认的sql_mode下无法导入,否则就需要去掉严格的sql_mode
Query OK, 0 rows affected, 4 warnings (0.00 sec)
Records: 4 Deleted: 0 Skipped: 4 Warnings: 4
Warning (Code 1062): Duplicate entry '2' for key 'PRIMARY'
Warning (Code 1062): Duplicate entry '4' for key 'PRIMARY'
Warning (Code 1062): Duplicate entry '6' for key 'PRIMARY'
Warning (Code 1062): Duplicate entry '8' for key 'PRIMARY'
admin@localhost : xiaoboluo 01:47:01> show warnings;
+---------+------+---------------------------------------+
| Level | Code | Message |
+---------+------+---------------------------------------+
| Warning | 1062 | Duplicate entry '2' for key 'PRIMARY' |
| Warning | 1062 | Duplicate entry '4' for key 'PRIMARY' |
| Warning | 1062 | Duplicate entry '6' for key 'PRIMARY' |
| Warning | 1062 | Duplicate entry '8' for key 'PRIMARY' |
+---------+------+---------------------------------------+
4 rows in set (0.00 sec)
admin@localhost : xiaoboluo 01:47:05> select * from test3;
+----+------------------------------------------+--------+
| id | test | test2 |
+----+------------------------------------------+--------+
| 2 | a string | 100.20 |
| 4 | a string containing a , comma | 102.20 |
| 6 | a string containing a " quote | 102.20 |
| 8 | a string containing a ", quote and comma | 102.20 |
+----+------------------------------------------+--------+
4 rows in set (0.00 sec)
1.2.3. REPLACE与IGNORE关键字
REPLACE和IGNORE关键字控制对唯一键值冲突行的处理:
如果指定了REPLACE关键字,则输入行将覆盖现有行。换句话说,与主键或唯一索引冲突的数据行将被执行覆盖写入,如果同时使用了local关键字,则与没有使用local关键字行为相同
如果指定了IGNORE关键字,则与唯一键值冲突的数据行将被丢弃,如果同时使用了local关键字,则与没有使用local关键字行为相同
如果不指定REPLACE和IGNORE任一选项,则行为取决于是否指定LOCAL关键字。没有LOCAL,则发现主键或唯一索引冲突时就报错终止load data语句执行,并忽略文本文件的其余部分的载入。如果使用了LOCAL关键字,则local关键字的默认行为与指定IGNORE时相同,这是因为server端无法在操作期间停止客户端的文件传输(不使用REPLACE与IGNORE关键字时的错误处理,详见1.2.2.2小节)
下面对使用REPLACE与IGNORE关键字进行演示
如果文本文件中的数据字段有与表结构中的唯一索引、主键索引冲突的,则使用REPLACE关键字,该关键字会对冲突的数据进行覆盖(内部转换为UPDATE,见后续章节示例)
# 使用replace 覆盖冲突数据行,不使用local关键字
admin@localhost : xiaoboluo 01:47:47> flush logs;
Query OK, 0 rows affected (0.01 sec)
admin@localhost : xiaoboluo 02:16:34> select * from test3;
+----+------------------------------------------+--------+
| id | test | test2 |
+----+------------------------------------------+--------+
| 2 | a string | 100.20 |
| 4 | a string containing a , comma | 102.20 |
| 6 | a string containing a " quote | 102.20 |
| 8 | a string containing a ", quote and comma | 102.20 |
+----+------------------------------------------+--------+
4 rows in set (0.00 sec)
admin@localhost : xiaoboluo 02:16:48> show variables like '%binlog_format%';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| binlog_format | ROW |
+---------------+-------+
1 row in set (0.00 sec)
admin@localhost : xiaoboluo 02:17:26> show variables like '%tx_isolation%';
+---------------+----------------+
| Variable_name | Value |
+---------------+----------------+
| tx_isolation | READ-COMMITTED |
+---------------+----------------+
1 row in set (0.01 sec)
admin@localhost : xiaoboluo 02:18:01> update test3 set test2='111111111' where id=2; #修改其中一行数据
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
admin@localhost : xiaoboluo 02:20:39> select * from test3; #查看表中的数据,留意id=2的test2字段值为111111111
+----+------------------------------------------+-----------+
| id | test | test2 |
+----+------------------------------------------+-----------+
| 2 | a string | 111111111 |
| 4 | a string containing a , comma | 102.20 |
| 6 | a string containing a " quote | 102.20 |
| 8 | a string containing a ", quote and comma | 102.20 |
+----+------------------------------------------+-----------+
4 rows in set (0.00 sec)
admin@localhost : xiaoboluo 02:20:41> load data infile '/tmp/test3.txt' replace into table test3; #执行不带local关键字但带repalce关键字的语句
Query OK, 5 rows affected (0.00 sec)
Records: 4 Deleted: 1 Skipped: 0 Warnings: 0
admin@localhost : xiaoboluo 02:20:50> select * from test3; #查询表中的数据,可以发现id=2的行的test2字段被覆盖为文本文件中的值100.20 了
+----+------------------------------------------+--------+
| id | test | test2 |
+----+------------------------------------------+--------+
| 2 | a string | 100.20 |
| 4 | a string containing a , comma | 102.20 |
| 6 | a string containing a " quote | 102.20 |
| 8 | a string containing a ", quote and comma | 102.20 |
+----+------------------------------------------+--------+
4 rows in set (0.00 sec)
# 解析最后一个binlog来查看一下(这里仅作参考,主库内部是否是binlog中记录的形式处理覆盖数据有待考证)
$ mysqlbinlog -vv --base64-output=decode-rows mysql-bin.000004
......
BEGIN
/*!*/;
# at 732
#170503 14:20:50 server id 3306241 end_log_pos 814 CRC32 0xb4da8deb Rows_query
# load data infile '/tmp/test3.txt' replace into table test3 #这里是执行的原始load data 的sql语句,要看到此内容,需要打开参数:binlog_rows_query_log_events=ON
# at 814
#170503 14:20:50 server id 3306241 end_log_pos 873 CRC32 0xe0066c03 Table_map: `xiaoboluo`.`test3` mapped to number 253
# at 873
#170503 14:20:50 server id 3306241 end_log_pos 958 CRC32 0xe95b171c Update_rows: table id 253 flags: STMT_END_F
### UPDATE `xiaoboluo`.`test3` #这里可以看到发生数据冲突的行被执行了update
### WHERE
### @1=2 /* INT meta=0 nullable=0 is_null=0 */
### @2='a string' /* VARSTRING(300) meta=300 nullable=1 is_null=0 */
### @3='111111111' /* VARSTRING(300) meta=300 nullable=1 is_null=0 */
### SET
### @1=2 /* INT meta=0 nullable=0 is_null=0 */
### @2='a string' /* VARSTRING(300) meta=300 nullable=1 is_null=0 */
### @3='100.20' /* VARSTRING(300) meta=300 nullable=1 is_null=0 */
# at 958
#170503 14:20:50 server id 3306241 end_log_pos 989 CRC32 0x84b1f86e Xid = 364
COMMIT/*!*/;
......
也可以使用IGNORE关键字忽略冲突的行(注意,这里的ignore与ignore number lines子句中的ignore作用不同,不要搞混淆)
admin@localhost : xiaoboluo 02:26:45> flush logs;
Query OK, 0 rows affected (0.01 sec)
admin@localhost : xiaoboluo 02:29:18> select * from test3;
+----+------------------------------------------+--------+
| id | test | test2 |
+----+------------------------------------------+--------+
| 2 | a string | 100.20 |
| 4 | a string containing a , comma | 102.20 |
| 6 | a string containing a " quote | 102.20 |
| 8 | a string containing a ", quote and comma | 102.20 |
+----+------------------------------------------+--------+
4 rows in set (0.00 sec)
admin@localhost : xiaoboluo 02:29:23> show variables like '%binlog_format%';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| binlog_format | ROW |
+---------------+-------+
1 row in set (0.00 sec)
admin@localhost : xiaoboluo 02:29:29> show variables like '%tx_isolation%';
+---------------+----------------+
| Variable_name | Value |
+---------------+----------------+
| tx_isolation | READ-COMMITTED |
+---------------+----------------+
1 row in set (0.00 sec)
admin@localhost : xiaoboluo 02:29:34> update test3 set test2='111111111' where id=2;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
admin@localhost : xiaoboluo 02:29:41> select * from test3;
+----+------------------------------------------+-----------+
| id | test | test2 |
+----+------------------------------------------+-----------+
| 2 | a string | 111111111 |
| 4 | a string containing a , comma | 102.20 |
| 6 | a string containing a " quote | 102.20 |
| 8 | a string containing a ", quote and comma | 102.20 |
+----+------------------------------------------+-----------+
4 rows in set (0.00 sec)
admin@localhost : xiaoboluo 02:29:45> load data infile '/tmp/test3.txt' ignore into table test3; #这里可以看到Skipped: 4 Warnings: 4,表示4行数据都跳过了,注意:这是在sql_mode=''时导入的,\
如果不修改sql_mode请使用local关键字
Query OK, 0 rows affected, 4 warnings (0.00 sec)
Records: 4 Deleted: 0 Skipped: 4 Warnings: 4
Warning (Code 1062): Duplicate entry '2' for key 'PRIMARY'
Warning (Code 1062): Duplicate entry '4' for key 'PRIMARY'
Warning (Code 1062): Duplicate entry '6' for key 'PRIMARY'
Warning (Code 1062): Duplicate entry '8' for key 'PRIMARY'
admin@localhost : xiaoboluo 02:30:04> select * from test3; #查看一下执行load data语句之后的表数据,发现id=2的test2列值还是111111111 没有变化
+----+------------------------------------------+-----------+
| id | test | test2 |
+----+------------------------------------------+-----------+
| 2 | a string | 111111111 |
| 4 | a string containing a , comma | 102.20 |
| 6 | a string containing a " quote | 102.20 |
| 8 | a string containing a ", quote and comma | 102.20 |
+----+------------------------------------------+-----------+
4 rows in set (0.00 sec)
1.2.4. PARTITION子句
LOAD DATA支持使用PARTITION选项显式分区选择,其中包含一个或多个分区,子分区或两者名称的逗号分隔列表。当使用此选项时,如果文件中的任何行无法插入到列表中指定的任何分区或子分区中,则该语句将失败,并显示错误,找到与给定分区集不匹配的行,本小节不做演示,更多信息参考链接: https://dev.mysql.com/doc/refman/5.7/en/load-data.html
1.2.5. CHARACTER SET charset_name子句
导入的文本数据文件名必须以文字字符串形式给出。在Windows上,在路径名称中指定反斜杠为正斜杠或双倍反斜杠。 character_set_filesystem系统变量控制文件名的解释
服务器使用由character_set_database系统变量指定的字符集来解释文件中的内容。SET NAMES语句和character_set_client系统变量的设置不影响文件内容的字符集解析。如果输入文件的内容使用的字符集与server的默认值字符集不同,则建议先使用load data的CHARACTER SET子句指定解析文本文件内容的字符集
LOAD DATA INFILE将文件中的所有字段以相同的字符集进行解析,而不管加载字段列的数据类型定义的字符集如何。为了正确解释文件内容,您必须确保使用正确的字符集进行导出数据和导入数据。例如,如果您使用mysqldump -T或通过在mysql中执行SELECT … INTO OUTFILE语句导出数据文件时,建议使用--default-character-set选项指定一个对应数据的字符集,以便使用Load data语句导入数据时不会发生字符集错乱(使用mysqlimport的--default-character-set指定导出数据时的字符集,使用mysql命令行客户端的--default-character-set指定导出数据时的字符集,注:不能指定ucs2,utf16,utf16le或utf32字符集来加载数据文件)
本小节不做演示,更多信息参考链接: https://dev.mysql.com/doc/refman/5.7/en/load-data.html
1.2.6. FIELDS(与COLUMNS关键字相同)和LINES子句
以下示例中的char代表单个字符,string代表字符串(即多个字符),load data语句中,转义字符和字段引用符只能使用单个字符,字段分隔符、行分隔符、行前缀字符都可以使用多个字符(字符串)
对于LOAD DATA INFILE和SELECT … INTO OUTFILE语句中,FIELDS和LINES子句的语法完全相同。两个子句在LOAD DATA INFILE和SELECT … INTO OUTFILE语句中都是可选的,但如果两个子句都被指定,则FIELDS必须在LINES之前,否则报语法错误
FIELDS关键字共有三个子句,TERMINATED BY 'string'指定字段分隔符,[OPTIONALLY] ENCLOSED BY 'char'指定字段引用符(其中使用了OPTIONALLY关键字之后,只在char、varchar和text等字符型字段上加字段引用符,数值型的不会加字段引用符,且OPTIONALLY 关键字只在导出数据时才起作用,导入数据时用于不用对数据没有影响 ),ESCAPED BY 'char'指定转义符,如果您指定了一个FIELDS子句,则它的每个子句也是可选的,但在你指定了FIELDS关键字之后,这个关键字的子句至少需要指定一个,后续章节会进行举例说明
LINES关键字共有两个子句,STARTING BY 'string'指定行前缀字符,TERMINATED BY 'string'指定行分隔符(换行符),如果你指定了LINES关键字,则LINES的子句都是可选的,但在你指定了LINES关键字之后,这个关键字的子句至少需要指定一个,后续章节会进行举例说明
如果在导入和导出时没有指定FIELDS和LINES子句,则导入和导出时两个子句的默认值相同,默认的字段分隔符为\t,行分隔符为\n(win上默认为\r\n,记事本程序上默认为\r),字段引用符为空,行前缀字符为空
当mysql server导出文本数据到文件时,FIELDS和LINES默认值时SELECT … INTO OUTFILE在输出文本数据时行为如下:
在文本数据各字段之间使用制表符来作为字段分隔符
不使用任何引号来包围文本数据的各字段值,即字段引用符为空
使用\转义在字段值中出现的制表符\t,换行符\n或转义符本身\等特殊字符(即输出的文本数据中对这些特殊字符前多加一个反斜杠)
在行尾写上换行符\n,即使用\n作为行分隔符(换行符)
注意:如果您在Windows系统上生成了文本文件,则可能必须使用LINES TERMINATED BY '\r\n'来正确读取文件,因为Windows程序通常使用两个字符作为行终止符。某些程序(如写字板)在写入文件时可能会使用\r作为行终止符(要读取这些文件,请使用LINES TERMINATED BY '\r')
FIELDS和LINES子句默认值时生成的纯文本数据文件可以使用python代码来读取文件查看文件中的制表符和换行符(linux下的cat和vim等编辑器默认会解析\t为制表符,\n为换行符,所以使用这些命令可能无法看到这些特殊符号)
>>> f = open('/tmp/test3.txt','r')
>>> data = f.readlines()
>>> data
['2,"a string","100.20"\n', '4,"a string containing a , comma","102.20"\n', '6,"a string containing a \\" quote","102.20"\n', '8,"a string containing a \\", quote and comma","102.20"\n']
>>> for i in data:
... print i,
...
2,"a string","100.20"
4,"a string containing a , comma","102.20"
6,"a string containing a \" quote","102.20"
8,"a string containing a \", quote and comma","102.20"
当mysql server从文本文件读取数据时,FIELDS和LINES默认值会导致LOAD DATA INFILE的行为如下:
寻找换行边界字符\n来进行换行
不跳过行前缀,把行前缀也当作数据(发生在如果导出数据时使用了行前缀,导入时没有指定正确的行前缀或者根本没有指定行前缀选项时)
使用制表符\t来分割一行数据中的各列
要注意:在FIELDS和LINES的默认值下,在解析文本文件时不会把字符串之间的引号当作真正的引号,而是当作数据
1.2.6.1. FIELDS关键字及其子句详解
字段分隔符,默认是\t,使用子句 fields terminated by 'string' 指定,其中string代表指定的字段分隔符
admin@localhost : xiaoboluo 03:08:34> select * from test3 into outfile "/tmp/test3.txt" FIELDS TERMINATED BY ',';
Query OK, 4 rows affected (0.00 sec)
admin@localhost : xiaoboluo 03:08:37> system cat /tmp/test3.txt
2,a string,100.20
4,a string containing a \, comma,102.20
6,a string containing a " quote,102.20
8,a string containing a "\, quote and comma,102.20
字段引用符,如果加optionally选项则只用在char、varchar和text等字符型字段上,数值类型会忽略使用引用符,如果不指定该子句,则默认不使用引用符,使用子句fields [optionally] enclosed by 'char'指定,其中char代表指定的字段引用符
# 指定字段引用符为",不使用optionally关键字
admin@localhost : xiaoboluo 03:33:33> system rm -f /tmp/test3.txt;
admin@localhost : xiaoboluo 03:37:21> select * from test3 into outfile "/tmp/test3.txt" FIELDS ENCLOSED BY '"';
Query OK, 5 rows affected (0.00 sec)
admin@localhost : xiaoboluo 03:37:33> system cat /tmp/test3.txt
"2" "a string" "100.20"
"4" "a string containing a , comma" "102.20"
"6" "a string containing a \" quote" "102.20"
"8" "a string containing a \", quote and comma" "102.20"
"10" "\\t" "102.20"
# 指定字段引用符为",使用optionally关键字,可以看到id列的字段引用符去掉了
admin@localhost : xiaoboluo 03:37:41> system rm -f /tmp/test3.txt;
admin@localhost : xiaoboluo 03:40:53> select * from test3 into outfile "/tmp/test3.txt" FIELDS optionally ENCLOSED BY '"';
Query OK, 5 rows affected (0.00 sec)
admin@localhost : xiaoboluo 03:41:03> system cat /tmp/test3.txt
2 "a string" "100.20"
4 "a string containing a , comma" "102.20"
6 "a string containing a \" quote" "102.20"
8 "a string containing a \", quote and comma" "102.20"
10 "\\t" "102.20"
转义字符,默认为\,使用子句fields escaped by 'char' 指定,其中char代表指定的转义字符
admin@localhost : xiaoboluo 03:42:41> system rm -f /tmp/test3.txt;
admin@localhost : xiaoboluo 03:44:18> select * from test3 into outfile "/tmp/test3.txt" fields escaped by '.';
Query OK, 5 rows affected (0.00 sec)
admin@localhost : xiaoboluo 03:44:25> system cat /tmp/test3.txt # 可以看到数据中指定的转义符.号被转义了,而数据\t没有被转义
2 a string 100..20
4 a string containing a , comma 102..20
6 a string containing a " quote 102..20
8 a string containing a ", quote and comma 102..20
10 \t 102..20
admin@localhost : xiaoboluo 03:44:28> truncate test3; #清空表
Query OK, 0 rows affected (0.01 sec)
admin@localhost : xiaoboluo 03:45:19> load data infile "/tmp/test3.txt" into table test3 fields escaped by '.'; #导入数据时指定转义符为.号
Query OK, 5 rows affected (0.00 sec)
Records: 5 Deleted: 0 Skipped: 0 Warnings: 0
admin@localhost : xiaoboluo 03:45:40> select * from test3; #校验数据,可以看到导入数据正常
+----+------------------------------------------+--------+
| id | test | test2 |
+----+------------------------------------------+--------+
| 2 | a string | 100.20 |
| 4 | a string containing a , comma | 102.20 |
| 6 | a string containing a " quote | 102.20 |
| 8 | a string containing a ", quote and comma | 102.20 |
| 10 | \t | 102.20 |
+----+------------------------------------------+--------+
5 rows in set (0.00 sec)
1.2.6.2. LINES 关键字及其子句详解
行前缀字符串,使用子句lines starting by 'string' 指定,其中string代表指定的行前缀字符串,行前缀字符串在导出文本数据时使用该子句指定,在导入文本时在一行数据中如果发现了行前缀字符串,则只导入从前缀字符串开始之后的数据部分,前缀字符本身及其之前的数据被忽略掉,如果某行数据不包含行前缀字符串,则整行数据都会被忽略
如果您想要读取的纯文本文件中所有行都有一个您想要忽略的公用前缀,则可以使用LINES STARTING BY'prefix_string'来跳过这个前缀,以及前缀字符前面的任何内容。如果某行数据不包含前缀字符,则跳过整行内容,例如:
# load data语句如下
admin@localhost : xiaoboluo 03:48:04> system rm -f /tmp/test3.txt;
admin@localhost : xiaoboluo 03:54:54> select * from test3 into outfile "/tmp/test3.txt" LINES STARTING BY 'xxx';
Query OK, 5 rows affected (0.00 sec)
admin@localhost : xiaoboluo 03:55:03> system cat /tmp/test3.txt #可以看到每行数据前面多了个行前缀字符串xxx
xxx2 a string 100.20
xxx4 a string containing a , comma 102.20
xxx6 a string containing a " quote 102.20
xxx8 a string containing a ", quote and comma 102.20
xxx10 \\t 102.20
# 现在,到shell命令行去修改一下,增加两行
admin@localhost : xiaoboluo 03:55:50> system cat /tmp/test3.txt # 最后要加载的纯文本数据内容如下
xxx2 a string 100.20
xxx4 a string containing a , comma 102.20
xxx6 a string containing a " quote 102.20
xxx8 a string containing a ", quote and comma 102.20
xxx10 \\t 102.20
12 \\t 102.20
dfadsfasxxx14 \\t 102.20
admin@localhost : xiaoboluo 03:59:03> truncate test3; #清空表
Query OK, 0 rows affected (0.01 sec)
admin@localhost : xiaoboluo 03:59:38> load data infile "/tmp/test3.txt" into table test3 LINES STARTING BY 'xxx'; #导入数据,指定行前缀字符为xxx
Query OK, 6 rows affected (0.00 sec)
Records: 6 Deleted: 0 Skipped: 0 Warnings: 0
admin@localhost : xiaoboluo 03:59:44> select * from test3; #校验表数据,可以看到没有xxx行前缀的行被忽略了,而包含xxx的最后一行,从xxx开始截断,xxx字符本身及其之前的内容被忽略,\
xxx之后的内容被解析为行数据导入了
+----+------------------------------------------+--------+
| id | test | test2 |
+----+------------------------------------------+--------+
| 2 | a string | 100.20 |
| 4 | a string containing a , comma | 102.20 |
| 6 | a string containing a " quote | 102.20 |
| 8 | a string containing a ", quote and comma | 102.20 |
| 10 | \t | 102.20 |
| 14 | \t | 102.20 |
+----+------------------------------------------+--------+
6 rows in set (0.00 sec)
行结束符(换行符),linux下默认为\n,使用子句lines terminated by 'string' 指定,其中string代表指定的换行符
# 指定换行符为\r\n导出数据
admin@localhost : xiaoboluo 03:59:49> system rm -f /tmp/test3.txt;
admin@localhost : xiaoboluo 04:02:22> select * from test3 into outfile "/tmp/test3.txt" lines terminated by '\r\n';
Query OK, 6 rows affected (0.00 sec)
# 由于linux的一些命令本身会解析掉这些特殊字符,所以使用python来查看这个文本文件中的换行符,从下面的结果中可以看到,列表的每一个元素代表一行数据,每一个元素的\
末尾的\r\n就是这行数据的换行符
>>> f = open('/tmp/test3.txt','r')
>>> data = f.readlines()
>>> data
['2\ta string\t100.20\r\n', '4\ta string containing a , comma\t102.20\r\n', '6\ta string containing a " quote\t102.20\r\n', '8\ta string containing a ", quote and comma\t102.20\r\n', '10\t\\\\t\t102.20\r\n', \
'14\t\\\\t\t102.20\r\n']
>>>
# 现在,把数据重新导入表,从下面的结果中可以看到,导入表中的数据正确
admin@localhost : xiaoboluo 04:02:39> truncate test3;
Query OK, 0 rows affected (0.01 sec)
admin@localhost : xiaoboluo 04:04:55> load data infile "/tmp/test3.txt" into table test3 lines terminated by '\r\n';
Query OK, 6 rows affected (0.00 sec)
Records: 6 Deleted: 0 Skipped: 0 Warnings: 0
admin@localhost : xiaoboluo 04:05:11> select * from test3;
+----+------------------------------------------+--------+
| id | test | test2 |
+----+------------------------------------------+--------+
| 2 | a string | 100.20 |
| 4 | a string containing a , comma | 102.20 |
| 6 | a string containing a " quote | 102.20 |
| 8 | a string containing a ", quote and comma | 102.20 |
| 10 | \t | 102.20 |
| 14 | \t | 102.20 |
+----+------------------------------------------+--------+
6 rows in set (0.00 sec)
1.2.6.3. FIELDS和LINES注意事项
众所周知,MySQL中反斜杠是SQL语句中特殊字符的转义字符,因此在sql语句中碰到特殊字符时,您必须指定一个或者两个反斜杠来为特殊字符转义(如在mysql中或者一些其他程序中,\n代表换行符,\t代表制表符,\代表转义符,那么需要使用\t来转义制表符,\n来转义换行符,\来转义转义符本身,这样才能正确写入数据库或者生成导出的数据文本,使用FIELDS ESCAPED BY子句指定转义符
特殊字符列表如下
\0 ASCII NUL (X'00') 字符
\b 退格字符
\n 换行符
\r 回车符
\t 制表符
\Z ASCII 26 (Control+Z)
\N NULL值,如果转义符值为空,则会直接导出null字符串作为数据,这在导入时将把null作为数据导入,而不是null符号
如果数据中包含了ENCLOSED BY '"'子句指定字段引用符号,则与字段引用符号相同数据字符也会被自动添加一个反斜杠进行转义(如果转义符指定为空,则可能会导致数据在导入时无法正确解析)。如果数据中包含了FIELDS TERMINATED BY 子句指定的字段分隔符,则以FIELDS ENCLOSED BY子句指定的字段引用符号为准,被引起来的整个部分作为一整列的数据,列值之间的数据包含字段分隔符不会被转义,而是作为数据处理,但数据中包含的字段引用符会被转义(在数据中包含了字段分隔符的情况下,如果字段引用符号没有指定或者指定为空值,则可能在导入数据时无法正确解析)。如果数据中包含了FIELDS ESCAPED BY子句指定的转义符,字段引用符和行分隔符使用默认值,则在数据中的转义符会被转义(只要不为空,则不管字段分隔符和转义字符定义为什么值,都会被转义),默认情况下,不建议随意更改换行符和转义符,除非必须且你需要校验修改之后数据能够正确导入
# 字段引用符为",数据中包含",转义符和换行符保持默认,导入数据时不会有任何问题
admin@localhost : xiaoboluo 09:46:14> select * from test3;
+----+------------------------------------------+--------+
| id | test | test2 |
+----+------------------------------------------+--------+
| 2 | a string | 100.20 |
| 4 | a string containing a , comma | 102.20 |
| 6 | a string containing a " quote | 102.20 |
| 8 | a string containing a ", quote and comma | 102.20 |
+----+------------------------------------------+--------+
4 rows in set (0.00 sec)
admin@localhost : xiaoboluo 09:46:17> select * from test3 into outfile "/tmp/test3.txt" FIELDS OPTIONALLY enclosed BY '"';
Query OK, 4 rows affected (0.00 sec)
admin@localhost : xiaoboluo 09:46:23> system cat /tmp/test3.txt;
2 "a string" "100.20"
4 "a string containing a , comma" "102.20"
6 "a string containing a \" quote" "102.20"
8 "a string containing a \", quote and comma" "102.20" # 可以看到与字段引用符相同的符号数据被转义了
admin@localhost : xiaoboluo 09:54:41> truncate test3;
Query OK, 0 rows affected (0.01 sec)
admin@localhost : xiaoboluo 09:58:01> load data infile '/tmp/test3.txt' into table test3 FIELDS OPTIONALLY enclosed BY '"' TERMINATED by ',';
Query OK, 4 rows affected (0.00 sec)
Records: 4 Deleted: 0 Skipped: 0 Warnings: 0
admin@localhost : xiaoboluo 09:58:45> select * from test3;
+----+------------------------------------------+--------+
| id | test | test2 |
+----+------------------------------------------+--------+
| 2 | a string | 100.20 |
| 4 | a string containing a , comma | 102.20 |
| 6 | a string containing a " quote | 102.20 |
| 8 | a string containing a ", quote and comma | 102.20 |
+----+------------------------------------------+--------+
4 rows in set (0.00 sec)
# 如果字段引用符为",字段分隔符为,且数据中包含字段引用符"和字段分隔符,,转义符和换行符保持默认,这在导入数据时不会有任何问题
admin@localhost : xiaoboluo 09:53:45> select * from test3 into outfile "/tmp/test3.txt" FIELDS OPTIONALLY enclosed BY '"' TERMINATED by ',';
Query OK, 4 rows affected (0.00 sec)
admin@localhost : xiaoboluo 09:54:29> system cat /tmp/test3.txt;
2,"a string","100.20"
4,"a string containing a , comma","102.20"
6,"a string containing a \" quote","102.20"
8,"a string containing a \", quote and comma","102.20"
admin@localhost : xiaoboluo 09:54:41> truncate test3;
Query OK, 0 rows affected (0.01 sec)
admin@localhost : xiaoboluo 09:58:01> load data infile '/tmp/test3.txt' into table test3 FIELDS OPTIONALLY enclosed BY '"' TERMINATED by ',';
Query OK, 4 rows affected (0.00 sec)
Records: 4 Deleted: 0 Skipped: 0 Warnings: 0
admin@localhost : xiaoboluo 09:58:45> select * from test3;
+----+------------------------------------------+--------+
| id | test | test2 |
+----+------------------------------------------+--------+
| 2 | a string | 100.20 |
| 4 | a string containing a , comma | 102.20 |
| 6 | a string containing a " quote | 102.20 |
| 8 | a string containing a ", quote and comma | 102.20 |
+----+------------------------------------------+--------+
4 rows in set (0.00 sec)
# 但是,如果在字段引用符为",数据中包含",字段分隔符使用逗号,换行符保持默认的情况下,转义符使用了空串,这会导致在导入数据时,第四行无法正确解析,报错
admin@localhost : xiaoboluo 09:58:01> load data infile '/tmp/test3.txt' into table test3 FIELDS OPTIONALLY enclosed BY '"' TERMINATED by ',';
Query OK, 4 rows affected (0.00 sec)
Records: 4 Deleted: 0 Skipped: 0 Warnings: 0
admin@localhost : xiaoboluo 09:58:45> select * from test3;
+----+------------------------------------------+--------+
| id | test | test2 |
+----+------------------------------------------+--------+
| 2 | a string | 100.20 |
| 4 | a string containing a , comma | 102.20 |
| 6 | a string containing a " quote | 102.20 |
| 8 | a string containing a ", quote and comma | 102.20 |
+----+------------------------------------------+--------+
4 rows in set (0.00 sec)
admin@localhost : xiaoboluo 09:58:49> select * from test3 into outfile "/tmp/test3_test.txt" FIELDS OPTIONALLY enclosed BY '"' TERMINATED by ',' escaped by '';
Query OK, 4 rows affected (0.00 sec)
admin@localhost : xiaoboluo 10:00:42> system cat /tmp/test3_test.txt;
2,"a string","100.20"
4,"a string containing a , comma","102.20"
6,"a string containing a " quote","102.20" #关于这一行数据,需要说明一下ENCLOSED BY子句,该子句指定的引用符号从一个FIELDS TERMINATED BY子句指定的分隔符开始,直到碰到下一个\
分隔符之间且这个分隔符前面一个字符必须是字段引用符号(如果这个分隔符前面一个字符不是字段引用符,则继续往后匹配,如第二行数据),在这之间的内容都会被当作整个列字符串处理,\
所以这一行数据在导入时不会发生解析错误
8,"a string containing a ", quote and comma","102.20" #这一行因为无法正确识别的字段结束位置,所以无法导入,报错终止,前面正确的行也被回滚掉(binlog_format=row)
admin@localhost : xiaoboluo 10:00:49> truncate test3;
Query OK, 0 rows affected (0.01 sec)
admin@localhost : xiaoboluo 10:01:03> load data infile '/tmp/test3_test.txt' into table test3 FIELDS OPTIONALLY enclosed BY '"' TERMINATED by ',' escaped by '';
ERROR 1262 (01000): Row 4 was truncated; it contained more data than there were input columns
admin@localhost : xiaoboluo 10:01:33> select * from test3;
Empty set (0.00 sec)
# 数据中包含了默认的转义符和指定的字段分隔符,字段引用符和行分隔符使用默认值,则在数据中的转义符和字段分隔符会被转义(只要不为空,则不管字段分隔符和转义字符定义为什么值,\
都会被转义)
admin@localhost : xiaoboluo 03:08:45> insert into test3(test,test2) values('\\t','102.20');
Query OK, 1 row affected (0.00 sec)
admin@localhost : xiaoboluo 03:17:29> select * from test3;
+----+------------------------------------------+--------+
| id | test | test2 |
+----+------------------------------------------+--------+
| 2 | a string | 100.20 |
| 4 | a string containing a , comma | 102.20 |
| 6 | a string containing a " quote | 102.20 |
| 8 | a string containing a ", quote and comma | 102.20 |
| 10 | \t | 102.20 |
+----+------------------------------------------+--------+
5 rows in set (0.00 sec)
admin@localhost : xiaoboluo 03:17:32> system rm -f /tmp/test3.txt;
admin@localhost : xiaoboluo 03:17:39> select * from test3 into outfile "/tmp/test3.txt" FIELDS TERMINATED BY ',';
Query OK, 5 rows affected (0.01 sec)
admin@localhost : xiaoboluo 03:17:42> system cat /tmp/test3.txt
2,a string,100.20
4,a string containing a \, comma,102.20
6,a string containing a " quote,102.20
8,a string containing a "\, quote and comma,102.20
10,\\t,102.20
当您使用SELECT … INTO OUTFILE与LOAD DATA INFILE一起将数据从数据库写入文件,然后再将该文件读回数据库时,两个语句的FIELDS和LINES处理选项必须匹配。否则,LOAD DATA INFILE将解析错误的文件内容,示例
# 假设您执行SELECT ... INTO OUTFILE语句时使用了逗号作为列分隔符:
SELECT * INTO OUTFILE 'data.txt'
FIELDS TERMINATED BY ','
FROM table2;
# 如果您尝试使用\t作为列分隔符,则它将无法正常工作,因为它会指示LOAD DATA INFILE在字段之间查找制表符,可能导致每个数据行整行解析时被当作单个字段:
LOAD DATA INFILE 'data.txt' INTO TABLE table2
FIELDS TERMINATED BY '\t';
# 要正确读取逗号分隔各列的文件,正确的语句是
LOAD DATA INFILE 'data.txt' INTO TABLE table2
FIELDS TERMINATED BY ',';
任何FIELDS和LINES处理选项都可以指定一个空字符串(''),但强烈不建议在FIELDS TERMINATED BY、FIELDS ESCAPED BY 和LINES TERMINATED BY子句中使用空串(空格不算空串)作为转义符和换行符,可能导致许多意外的问题,除非你确定使用空串不会出现问题。如果不为空,注意FIELDS [OPTIONALLY] ENCLOSED BY和FIELDS ESCAPED BY子句指定的值只能指定单个字符(即字段引用符号和转义符只能使用单个字符)。但 FIELDS TERMINATED BY, LINES STARTING BY, and LINES TERMINATED BY子句的值可以是多个字符(即字段分隔符和换行符、行前缀字符可以使用多个字符)。例如,指定一个LINES TERMINATED BY'\r\ n'子句,表示指定行换行符为\r\n,这个也是WIN下的换行符
# 如果LINES TERMINATED BY换行符指定了一个空字符,并且FIELDS TERMINATED BY字段分隔符指定的是非空的一个字符(或者使用默认值\t),则行也会以字段分隔符作为行的结束符\
(表现行为就是文本中最后一个字符就是字段分隔符),即整个文本看上去就是一整行数据了
admin@localhost : xiaoboluo 04:48:35> system rm -f /tmp/test3.txt;
admin@localhost : xiaoboluo 04:53:59> select * from test3 into outfile "/tmp/test3.txt" FIELDS TERMINATED BY ',' lines terminated by '';
Query OK, 6 rows affected (0.00 sec)
# 使用python查看文本内容,从下面的结果中可以看到,整个表的数据由于换行符为空,所以导致都拼接为一行了,最后行结束符使用了字段分隔符逗号
>>> f = open('/tmp/test3.txt','r')
>>> data = f.readlines()
>>> data
['2,a string,100.20,4,a string containing a \\, comma,102.20,6,a string containing a " quote,102.20,8,a string containing a "\\, quote and comma,102.20,10,\\\\t,102.20,14,\\\\t,102.20,']
>>>
# 导入数据到表,这里新建一张表来进行导入测试,预防清理掉了表数据之后,文本内容又无法正确导入的情况发生
admin@localhost : xiaoboluo 04:57:52> create table test4 like test3;
Query OK, 0 rows affected (0.01 sec)
admin@localhost : xiaoboluo 04:57:59> load data infile "/tmp/test3.txt" into table test4 FIELDS TERMINATED BY ',' lines terminated by '';
Query OK, 6 rows affected (0.00 sec)
Records: 6 Deleted: 0 Skipped: 0 Warnings: 0
admin@localhost : xiaoboluo 04:58:26> select * from test4; #从查询结果上看,数据正确导入表test4中了
+----+------------------------------------------+--------+
| id | test | test2 |
+----+------------------------------------------+--------+
| 2 | a string | 100.20 |
| 4 | a string containing a , comma | 102.20 |
| 6 | a string containing a " quote | 102.20 |
| 8 | a string containing a ", quote and comma | 102.20 |
| 10 | \t | 102.20 |
| 14 | \t | 102.20 |
+----+------------------------------------------+--------+
6 rows in set (0.00 sec)
# 如果FIELDS TERMINATED BY和FIELDS ENCLOSED BY值都为空(''),则使用固定行(非限制)格式。使用固定行格式时,字段之间使用足够宽的空格来分割各字段。对于数据类型\
是TINYINT,SMALLINT,MEDIUMINT,INT和BIGINT,字段宽度分别为4,6,8,11和20个空格(无论数据类型声明的显示宽度如何),对于varchar类型使用大约298个空格(这个空格数量是自己\
数的。。。,猜想这个空格数量可能与字符集,varchar定义长度有关,因为我在尝试把varchar定义为50个字符的时候,空格少了156个左右)
admin@localhost : xiaoboluo 04:58:31> system rm -f /tmp/test3.txt;
admin@localhost : xiaoboluo 05:04:05> select * from test3 into outfile "/tmp/test3.txt" FIELDS TERMINATED BY '' lines terminated by '';
Query OK, 6 rows affected (0.00 sec)
admin@localhost : xiaoboluo 05:04:17> system cat /tmp/test3.txt #下面展示内容中把打断空格使用...代替
2 a string ... 100.20 ... 4 a string containing a , comma ... 102.20 ... 6 a string containing a " quote ...102.20 ... 8 a string containing a ", quote and comma ... 102.20 \
... 10 \\t ... 102.20 ... 14 \\t ... 102.20 ... admin@localhost : xiaoboluo 05:04:35>
# 现在,清理掉test4表,并载入数据,从下面的结果中可以看到,导入表中之后,虽然数据是对的,但是多了非常多的空格,那么也就意味着你需要使用程序正确地处理一下这些多余的空格之后,\
再执行导入
admin@localhost : xiaoboluo 05:06:19> truncate test4;
Query OK, 0 rows affected (0.01 sec)
admin@localhost : xiaoboluo 05:06:47> load data infile "/tmp/test3.txt" into table test4 FIELDS TERMINATED BY '' lines terminated by ''; # 注意:这是在sql_mode=''时导入的,如果不修改\
sql_mode请使用local关键字
Query OK, 6 rows affected, 12 warnings (0.01 sec)
Records: 6 Deleted: 0 Skipped: 0 Warnings: 12
Note (Code 1265): Data truncated for column 'test' at row 1
Note (Code 1265): Data truncated for column 'test2' at row 1
Note (Code 1265): Data truncated for column 'test' at row 2
Note (Code 1265): Data truncated for column 'test2' at row 2
Note (Code 1265): Data truncated for column 'test' at row 3
Note (Code 1265): Data truncated for column 'test2' at row 3
Note (Code 1265): Data truncated for column 'test' at row 4
Note (Code 1265): Data truncated for column 'test2' at row 4
Note (Code 1265): Data truncated for column 'test' at row 5
Note (Code 1265): Data truncated for column 'test2' at row 5
Note (Code 1265): Data truncated for column 'test' at row 6
Note (Code 1265): Data truncated for column 'test2' at row 6
admin@localhost : xiaoboluo 05:07:09> select * from test4;
+----+------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
| id | test | test2 |
+----+------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
| 2 | a string | 100.20 |
| 4 | a string containing a , comma | 102.20 |
| 6 | a string containing a " quote | 102.20 |
| 8 | a string containing a ", quote and comma | 102.20 |
| 10 | \t | 102.20 |
| 14 | \t | 102.20 |
+----+------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
6 rows in set (0.00 sec)
NULL值的处理根据使用的FIELDS和LINES选项而有所不同
# 对于默认的FIELDS和LINES值,NULL值被转义为\N输出,字段值\N读取时使用NULL替换并输入(假设ESCAPED BY字符为\)
admin@localhost : xiaoboluo 05:17:07> alter table test3 add column test3 varchar(10); #添加一个字段test3,默认值会被填充为null
Query OK, 0 rows affected (0.04 sec)
Records: 0 Duplicates: 0 Warnings: 0
admin@localhost : xiaoboluo 05:17:33> select * from test3; #查看表中的test3列数据
+----+------------------------------------------+--------+-------+
| id | test | test2 | test3 |
+----+------------------------------------------+--------+-------+
| 2 | a string | 100.20 | NULL |
| 4 | a string containing a , comma | 102.20 | NULL |
| 6 | a string containing a " quote | 102.20 | NULL |
| 8 | a string containing a ", quote and comma | 102.20 | NULL |
| 10 | \t | 102.20 | NULL |
| 14 | \t | 102.20 | NULL |
+----+------------------------------------------+--------+-------+
6 rows in set (0.00 sec)
admin@localhost : xiaoboluo 05:17:37> select * from test3 into outfile "/tmp/test3.txt"; #执行导出
Query OK, 6 rows affected (0.00 sec)
admin@localhost : xiaoboluo 05:18:02> system cat /tmp/test3.txt #查看导出的文本文件,可以发现null被转义为\N了,这是为了避免数据字符串本身包含null值时无法正确区分数据类型的null值
2 a string 100.20 \N
4 a string containing a , comma 102.20 \N
6 a string containing a " quote 102.20 \N
8 a string containing a ", quote and comma 102.20 \N
10 \\t 102.20 \N
14 \\t 102.20 \N
# 导入数据,从结果中可以看到\N被正确解析为了数据类型的null值
admin@localhost : xiaoboluo 05:18:06> truncate test3;
Query OK, 0 rows affected (0.01 sec)
admin@localhost : xiaoboluo 05:20:36> load data infile '/tmp/test3.txt' into table test3;
Query OK, 6 rows affected (0.01 sec)
Records: 6 Deleted: 0 Skipped: 0 Warnings: 0
admin@localhost : xiaoboluo 05:20:52> select * from test3;
+----+------------------------------------------+--------+-------+
| id | test | test2 | test3 |
+----+------------------------------------------+--------+-------+
| 2 | a string | 100.20 | NULL |
| 4 | a string containing a , comma | 102.20 | NULL |
| 6 | a string containing a " quote | 102.20 | NULL |
| 8 | a string containing a ", quote and comma | 102.20 | NULL |
| 10 | \t | 102.20 | NULL |
| 14 | \t | 102.20 | NULL |
+----+------------------------------------------+--------+-------+
6 rows in set (0.00 sec)
# 如果FIELDS ENCLOSED BY不为空,FIELDS escaped BY为空时,则将NULL值的字面字符串作为输出字符值。这与FIELDS ENCLOSED BY引用的字符串中包含的null值不同,\
后者读取为字符串'null',而前者读取到数据库中时被当作数据类型的null值,而不是数据的字符串null
admin@localhost : xiaoboluo 05:20:57> update test3 set test3='null' where id=2; #更新id=2的test3列值为数据字符串的null
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
admin@localhost : xiaoboluo 05:23:14> select * from test3;
+----+------------------------------------------+--------+-------+
| id | test | test2 | test3 |
+----+------------------------------------------+--------+-------+
| 2 | a string | 100.20 | null |
| 4 | a string containing a , comma | 102.20 | NULL |
| 6 | a string containing a " quote | 102.20 | NULL |
| 8 | a string containing a ", quote and comma | 102.20 | NULL |
| 10 | \t | 102.20 | NULL |
| 14 | \t | 102.20 | NULL |
+----+------------------------------------------+--------+-------+
6 rows in set (0.00 sec)
admin@localhost : xiaoboluo 05:23:16> system rm -f /tmp/test3.txt;
admin@localhost : xiaoboluo 05:24:05> select * from test3 into outfile "/tmp/test3.txt" FIELDS ENCLOSED BY '"' escaped BY ''; #指定行引用符号为双引号",转义符为空导出数据
Query OK, 6 rows affected (0.00 sec)
admin@localhost : xiaoboluo 05:24:51> system cat /tmp/test3.txt #查看导出的文本文件,可以看到数据字符串的null被加了双引号,而数据类型的null没有加双引号
"2" "a string" "100.20" "null"
"4" "a string containing a , comma" "102.20" NULL
"6" "a string containing a " quote" "102.20" NULL
"8" "a string containing a ", quote and comma" "102.20" NULL
"10" "\t" "102.20" NULL
"14" "\t" "102.20" NULL
admin@localhost : xiaoboluo 05:24:57> alter table test4 add column test3 varchar(10);
Query OK, 0 rows affected (0.04 sec)
Records: 0 Duplicates: 0 Warnings: 0
admin@localhost : xiaoboluo 05:26:40> truncate test4; #这里使用test4表做测试,避免无法导入的情况发生
Query OK, 0 rows affected (0.00 sec)
admin@localhost : xiaoboluo 05:26:44> load data infile '/tmp/test3.txt' into table test4 FIELDS ENCLOSED BY '"' escaped BY ''; #指定字段引用符为双引号",转义符为空导入数据
Query OK, 6 rows affected (0.00 sec)
Records: 6 Deleted: 0 Skipped: 0 Warnings: 0
admin@localhost : xiaoboluo 05:27:02> select * from test4; #查看表test4中的数据,从结果中可以看到,数据导入正确
+----+------------------------------------------+--------+-------+
| id | test | test2 | test3 |
+----+------------------------------------------+--------+-------+
| 2 | a string | 100.20 | null |
| 4 | a string containing a , comma | 102.20 | NULL |
| 6 | a string containing a " quote | 102.20 | NULL |
| 8 | a string containing a ", quote and comma | 102.20 | NULL |
| 10 | \t | 102.20 | NULL |
| 14 | \t | 102.20 | NULL |
+----+------------------------------------------+--------+-------+
6 rows in set (0.00 sec)
# 使用固定行格式(当FIELDS TERMINATED BY和FIELDS ENCLOSED BY都为空时使用),将NULL写为空字符串。这将导致表中的NULL值和空字符串在写入文件时无法区分,\
因为它们都以空字符串形式写入文本文件。如果您需要能够在读取文件时将其分开,则不应使用固定行格式(即不应该使用FIELDS TERMINATED BY和FIELDS ENCLOSED BY都为空)
admin@localhost : xiaoboluo 05:29:11> system rm -f /tmp/test3.txt;
admin@localhost : xiaoboluo 05:29:22> select * from test3 into outfile "/tmp/test3.txt" FIELDS ENCLOSED BY '' TERMINATED BY '';
Query OK, 6 rows affected (0.00 sec)
admin@localhost : xiaoboluo 05:29:43> system cat /tmp/test3.txt #从结果中看,是不是有点似曾相识呢?没错,前面演示过FIELDS TERMINATED BY和FIELDS ENCLOSED BY都为空的情况,\
使用了固定格式来导出文本,但是这里多了数据类型的null值处理,从下面的结果中已经看不到数据类型的null了,被转换为了空值(下面展示时把大段空格使用...代替)
2 a string ... 100.20 ... null
4 a string containing a , comma ... 102.20 ...
6 a string containing a " quote ... 102.20 ...
8 a string containing a ", quote and comma ... 102.20 ...
10 \\t ... 102.20 ...
14 \\t ... 102.20 ...
admin@localhost : xiaoboluo 05:29:46> truncate test4; #清空test4
Query OK, 0 rows affected (0.01 sec)
admin@localhost : xiaoboluo 05:34:15> load data infile "/tmp/test3.txt" into table test4 FIELDS ENCLOSED BY '' TERMINATED BY ''; #执行导入文本到test4表,注意:这是在sql_mode=''时导入的,\
如果不修改sql_mode请使用local关键字
Query OK, 6 rows affected, 24 warnings (0.01 sec)
Records: 6 Deleted: 0 Skipped: 0 Warnings: 24
Note (Code 1265): Data truncated for column 'test' at row 1
Note (Code 1265): Data truncated for column 'test2' at row 1
Note (Code 1265): Data truncated for column 'test3' at row 1
Warning (Code 1262): Row 1 was truncated; it contained more data than there were input columns
Note (Code 1265): Data truncated for column 'test' at row 2
Note (Code 1265): Data truncated for column 'test2' at row 2
Note (Code 1265): Data truncated for column 'test3' at row 2
Warning (Code 1262): Row 2 was truncated; it contained more data than there were input columns
Note (Code 1265): Data truncated for column 'test' at row 3
Note (Code 1265): Data truncated for column 'test2' at row 3
Note (Code 1265): Data truncated for column 'test3' at row 3
Warning (Code 1262): Row 3 was truncated; it contained more data than there were input columns
Note (Code 1265): Data truncated for column 'test' at row 4
Note (Code 1265): Data truncated for column 'test2' at row 4
Note (Code 1265): Data truncated for column 'test3' at row 4
Warning (Code 1262): Row 4 was truncated; it contained more data than there were input columns
Note (Code 1265): Data truncated for column 'test' at row 5
Note (Code 1265): Data truncated for column 'test2' at row 5
Note (Code 1265): Data truncated for column 'test3' at row 5
Warning (Code 1262): Row 5 was truncated; it contained more data than there were input columns
Note (Code 1265): Data truncated for column 'test' at row 6
Note (Code 1265): Data truncated for column 'test2' at row 6
Note (Code 1265): Data truncated for column 'test3' at row 6
Warning (Code 1262): Row 6 was truncated; it contained more data than there were input columns
admin@localhost : xiaoboluo 05:34:35> select * from test4; #查询test4表,从下面的结果中可以看到,原本test2字段的数据被导入到了test3字段,而test3字段的内容被截断了。。
+----+----------------------------------------------------+----------------------------------------------------+------------+
| id | test | test2 | test3 |
+----+----------------------------------------------------+----------------------------------------------------+------------+
| 2 | a string | | 100.20 |
| 4 | a string containing a , comma | | 102.20 |
| 6 | a string containing a " quote | | 102.20 |
| 8 | a string containing a ", quote and comma | | 102.20 |
| 10 | \t | | 102.20 |
| 14 | \t | | 102.20 |
+----+----------------------------------------------------+----------------------------------------------------+------------+
6 rows in set (0.00 sec)
# 注意:如果使用多字节字符集,固定大小格式可能不起作用(我在测试时使用的字符集是utf8,没有测试出来这里说不起作用是啥意思)
load data执行时如果表中有外键、辅助索引、唯一索引,那么会导致加载数据的时间变慢,因为索引也需要一同更新,可以使用对应参数关闭外键检查、唯一索引检查甚至关闭索引
要在加载操作期间忽略外键约束,可以在执行load data语句之前执行SET foreign_key_checks = 0语句,执行完毕之后执行SET foreign_key_checks = 1或断开会话重连
要在加载操作期间忽略唯一索引约束,可以在执行load data语句之前执行set unique_checks=0语句,执行完毕之后执行set unique_checks=1或断开会话重连
在某些极端情况下(比如表中索引过多),您可以在执行load data语句之前通过执行ALTER TABLE … DISABLE KEYS语句关闭创建索引,在执行完load data语句之后执行ALTER TABLE … ENABLE KEYS来重新创建索引,注意该语句不能关闭主键索引
如果在sql_mode设置为严格模式下,且不使用local和ignore关键字时,碰到缺少字段值会直接报错终止,但在sql_mode设置为严格模式下,使用了local和ignore关键字时,则行为与不使用严格模式类似
LOAD DATA INFILE将所有输入视为字符串,因此您不能认为load data语句会像INSERT语句那样插入ENUM或SET列的数值。所有ENUM和SET值必须指定为字符串
LOAD DATA INFILE不支持的场景
固定大小的行(FIELDS TERMINATED BY和FIELDS ENCLOSED BY都为空)不支持BLOB或TEXT列
如果FIELDS TERMINATED BY和LINES STARTING BY指定相同的字符,则LOAD DATA INFILE无法正确解析
如果FIELDS ESCAPED BY为空,则字段中包含了FIELDS ENCLOSED BY或LINES TERMINATED BY或FIELDS TERMINATED BY的字符时会导致LOAD DATA INFILE语句拒绝读取字段并报错。这是因为LOAD DATA INFILE无法正确确定字段或行在哪里结束
PS:在Unix上,如果需要LOAD DATA从管道读取数据,可以使用以下方法(该示例将/目录的列表加载到表db1.t1中,find命令挂后台持续查找内容并生成ls.dat文件,mysql 客户端使用-e选项来执行load data这个文件到表):
mkfifo /mysql/data/db1/ls.dat
chmod 666 /mysql/data/db1/ls.dat
find / -ls> /mysql/data/db1/ls.dat&
mysql -e "LOAD DATA INFILE 'ls.dat' INTO TABLE t1" db1
1.2.7. IGNORE number {LINES | ROWS}子句
忽略输入文件中的前number行数据,使用子句ignore number lines指定忽略文本的前number行,在某些情况下生成的文本(如:mysql -e "select …." > xx.txt中)带有字段名称,在导入时会把这一行字段名称也当作数据,所以需要忽略掉这行字段名称
admin@localhost : xiaoboluo 05:34:41> system cat /tmp/test3.txt
id test test2 test3
2 a string 100.20 null
4 a string containing a , comma 102.20 NULL
6 a string containing a " quote 102.20 NULL
8 a string containing a ", quote and comma 102.20 NULL
10 \\t 102.20 NULL
14 \\t 102.20 NULL
admin@localhost : xiaoboluo 05:41:35> truncate test4;
Query OK, 0 rows affected (0.01 sec)
admin@localhost : xiaoboluo 05:41:41> load data infile "/tmp/test3.txt" into table test4 ignore 1 lines; #载入文本时指定ignore 1 lines子句忽略文本中的前1行数据
Query OK, 6 rows affected (0.00 sec)
Records: 6 Deleted: 0 Skipped: 0 Warnings: 0
admin@localhost : xiaoboluo 05:42:22> select * from test4; #查询表test4中的数据,从下面的结果中可以看到数据正确
+----+------------------------------------------+--------+-------+
| id | test | test2 | test3 |
+----+------------------------------------------+--------+-------+
| 2 | a string | 100.20 | null |
| 4 | a string containing a , comma | 102.20 | NULL |
| 6 | a string containing a " quote | 102.20 | NULL |
| 8 | a string containing a ", quote and comma | 102.20 | NULL |
| 10 | \t | 102.20 | NULL |
| 14 | \t | 102.20 | NULL |
+----+------------------------------------------+--------+-------+
6 rows in set (0.00 sec)
LOAD DATA INFILE可用于读取外部数据源文件。例如,许多程序可以以逗号分隔的值(CSV)格式导出数据,字段用逗号分隔,并包含在双引号内,并带有一个字段列名的初始行。如果这样一个文件中的数据行的换行符再使用回车符,则load data语句可以这样编写:
LOAD DATA INFILE 'data.txt' INTO TABLE tbl_name
FIELDS TERMINATED BY ',' ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES;
# 如果输入值不一定包含在引号内,请在ENCLOSED BY关键字之前使用OPTIONALLY,如:OPTIONALLY ENCLOSED BY '"',加上OPTIONALLY 可能会忽略数值类型的字段的引用符号,\
另外,如果你的csv文件第一行是数据而不是列名,那就不能使用IGNORE 1 LINES子句
1.2.8. (col_name_or_user_var,…)指定字段名称的子句
默认情况下,如果使用load data语句时表名后边不带字段,那么会把整个表的字段数据都导入到数据库中,如:LOAD DATA INFILE'persondata.txt'INTO TABLE persondata;
如果只想加载某些列,请指定列列表,如:LOAD DATA INFILE'persondata.txt'INTO TABLE persondata(col1,col2,…); ,要注意:如果输入文件中的字段值顺序与表中列的顺序不同,你需要对load data语句中的tb_name后跟的字段顺序做一下调整以对应文本文件中的字段顺序。否则,MySQL不能判断如何与表中的顺序对齐,列出列名时可以在tb_name后指定具体的列名,也可以使用表达式生成值指定给某个列名(使用set语句指定一个表达式,复制给一个变量,详见1.2.9小节),如果没有set语句,建议列名写在tb_name表名后边,方便理解,有set语句时就跟set语句写在一起
郑州不孕不育医院:http://yyk.39.net/zz3/zonghe/1d427.html
如果发现文件中的列顺序和表中的列顺序不符,或者只想加载部分列,在命令中加上列的顺序时指定的字段名也不一定非要放在紧跟着表名,可以放在语句最后面也可以,如:load data infile "/tmp/filename.txt" into table emp fields terminated by ',' enclosed by '"' ignore 2 lines (id,content,name);如果只需要导入一个字段id,则把 (id,content,name)换做(id)即可
使用示例参考1.1小节的“如果文本文件中的数据字段与表结构中的字段定义顺序不同,则使用如下语句指定载入表中的字段顺序”演示部分
1.2.8. SET col_name = expr,…子句
将列做一定的数值转换后再加载,使用子句set col_name = expr,.. 指定,要注意:col_name必须为表中真实的列名,expr可以是任意的表达式或者子查询,只要返回的数据结果值能对应上表中的字段数据定义类型即可,注意,非set语句生成的列名,必须使用括号括起来,否则报语法错误。
# 如果系统将id列的文本数据加上10以后再加载到表的test3列中,可以如下操作:
admin@localhost : xiaoboluo 06:05:42> system rm -f /tmp/test3.txt;
admin@localhost : xiaoboluo 06:06:00> select * from test3 into outfile "/tmp/test3.txt";
Query OK, 6 rows affected (0.00 sec)
admin@localhost : xiaoboluo 06:06:04> system cat /tmp/test3.txt
2 a string 100.20 null
4 a string containing a , comma 102.20 \N
6 a string containing a " quote 102.20 \N
8 a string containing a ", quote and comma 102.20 \N
10 \\t 102.20 \N
14 \\t 102.20 \N
admin@localhost : xiaoboluo 06:07:49> truncate test4;
Query OK, 0 rows affected (0.01 sec)
admin@localhost : xiaoboluo 06:07:53> load data infile "/tmp/test3.txt" into table test4 (id,test,test2) set test3=id+10 ;
ERROR 1262 (01000): Row 1 was truncated; it contained more data than there were input columns
admin@localhost : xiaoboluo 06:08:02> select * from test4; #严格模式下因为文本中多了一个字段被截断了,所以拒绝导入
Empty set (0.00 sec)
admin@localhost : xiaoboluo 06:08:08> load data local infile "/tmp/test3.txt" into table test4 (id,test,test2) set test3=id+10 ; #可以使用local关键字强制进行截断最后一个字段的null值列进行导入,\
注意,如果不使用local关键字,那就需要修改sql_mode才能导入
Query OK, 6 rows affected, 6 warnings (0.01 sec)
Records: 6 Deleted: 0 Skipped: 0 Warnings: 6
Warning (Code 1262): Row 1 was truncated; it contained more data than there were input columns
Warning (Code 1262): Row 2 was truncated; it contained more data than there were input columns
Warning (Code 1262): Row 3 was truncated; it contained more data than there were input columns
Warning (Code 1262): Row 4 was truncated; it contained more data than there were input columns
Warning (Code 1262): Row 5 was truncated; it contained more data than there were input columns
Warning (Code 1262): Row 6 was truncated; it contained more data than there were input columns
admin@localhost : xiaoboluo 06:10:45> select * from test4;
+----+------------------------------------------+--------+-------+
| id | test | test2 | test3 |
+----+------------------------------------------+--------+-------+
| 2 | a string | 100.20 | 12 |
| 4 | a string containing a , comma | 102.20 | 14 |
| 6 | a string containing a " quote | 102.20 | 16 |
| 8 | a string containing a ", quote and comma | 102.20 | 18 |
| 10 | \t | 102.20 | 20 |
| 14 | \t | 102.20 | 24 |
+----+------------------------------------------+--------+-------+
6 rows in set (0.00 sec)
# 或者使用txt文件中的某些列进行计算后生成新的列插入,这里演示两个字段进行相加后导入另外一个字段中:
admin@localhost : xiaoboluo 06:18:37> load data local infile "/tmp/test3.txt" into table test4 (id,test,test2) set test3=id+test2 ; # 注意,如果不使用local关键字,那就需要修改sql_mode才能导入
Query OK, 6 rows affected, 6 warnings (0.00 sec)
Records: 6 Deleted: 0 Skipped: 0 Warnings: 6
Warning (Code 1262): Row 1 was truncated; it contained more data than there were input columns
Warning (Code 1262): Row 2 was truncated; it contained more data than there were input columns
Warning (Code 1262): Row 3 was truncated; it contained more data than there were input columns
Warning (Code 1262): Row 4 was truncated; it contained more data than there were input columns
Warning (Code 1262): Row 5 was truncated; it contained more data than there were input columns
Warning (Code 1262): Row 6 was truncated; it contained more data than there were input columns
admin@localhost : xiaoboluo 06:19:07> select * from test4;
+----+------------------------------------------+--------+-------+
| id | test | test2 | test3 |
+----+------------------------------------------+--------+-------+
| 2 | a string | 100.20 | 102.2 |
| 4 | a string containing a , comma | 102.20 | 106.2 |
| 6 | a string containing a " quote | 102.20 | 108.2 |
| 8 | a string containing a ", quote and comma | 102.20 | 110.2 |
| 10 | \t | 102.20 | 112.2 |
| 14 | \t | 102.20 | 116.2 |
+----+------------------------------------------+--------+-------+
6 rows in set (0.00 sec)
SET子句中使用用户变量,用户变量可以以多种方式使用
# 可以直接使用一个用户变量并进行计算(计算表达式可以使用函数、运算符、子查询等都允许),然后赋值给test4列直接导入,而不需要从文件中读取test4列数据,该列数据也允许在文件中不存在
admin@localhost : xiaoboluo 06:27:06> alter table test4 add column test4 varchar(20); #新建一个字段test4,用于导入set子句计算的值
Query OK, 0 rows affected (0.01 sec)
Records: 0 Duplicates: 0 Warnings: 0
admin@localhost : xiaoboluo 06:27:56> truncate test4;
Query OK, 0 rows affected (0.01 sec)
admin@localhost : xiaoboluo 06:28:02> set @test=200; #设置一个用户变量
Query OK, 0 rows affected (0.00 sec)
admin@localhost : xiaoboluo 06:30:32> load data infile "/tmp/test3.txt" into table test4 (id,test,test2,test3) set test4=round(@test/100,0) ; #执行导入,使用set子句导入test4列通过表达式\
round(@test/100,0)计算之后的值
Query OK, 6 rows affected (0.00 sec)
Records: 6 Deleted: 0 Skipped: 0 Warnings: 0
admin@localhost : xiaoboluo 06:30:52> select * from test4; #查看test4表中导入的数据,从以下结果中来看,导入数据正确
+----+------------------------------------------+--------+-------+-------+
| id | test | test2 | test3 | test4 |
+----+------------------------------------------+--------+-------+-------+
| 2 | a string | 100.20 | null | 2 |
| 4 | a string containing a , comma | 102.20 | NULL | 2 |
| 6 | a string containing a " quote | 102.20 | NULL | 2 |
| 8 | a string containing a ", quote and comma | 102.20 | NULL | 2 |
| 10 | \t | 102.20 | NULL | 2 |
| 14 | \t | 102.20 | NULL | 2 |
+----+------------------------------------------+--------+-------+-------+
6 rows in set (0.00 sec)
# SET子句可以将一个内部函数返回的值直接导入到一个指定列
admin@localhost : xiaoboluo 06:31:22> truncate test4;
Query OK, 0 rows affected (0.01 sec)
admin@localhost : xiaoboluo 06:40:58> load data infile "/tmp/test3.txt" into table test4 (id,test,test2,test3) set test4=now() ;
Query OK, 6 rows affected (0.00 sec)
Records: 6 Deleted: 0 Skipped: 0 Warnings: 0
admin@localhost : xiaoboluo 06:41:02> select * from test4;
+----+------------------------------------------+--------+-------+---------------------+
| id | test | test2 | test3 | test4 |
+----+------------------------------------------+--------+-------+---------------------+
| 2 | a string | 100.20 | null | 2017-05-03 18:41:02 |
| 4 | a string containing a , comma | 102.20 | NULL | 2017-05-03 18:41:02 |
| 6 | a string containing a " quote | 102.20 | NULL | 2017-05-03 18:41:02 |
| 8 | a string containing a ", quote and comma | 102.20 | NULL | 2017-05-03 18:41:02 |
| 10 | \t | 102.20 | NULL | 2017-05-03 18:41:02 |
| 14 | \t | 102.20 | NULL | 2017-05-03 18:41:02 |
+----+------------------------------------------+--------+-------+---------------------+
6 rows in set (0.00 sec)
使用指定列名或者变量列表时SET子句的使用受以下限制:
SET子句中的赋值表达式赋值运算符的左侧只能使用数据库表中的真实列名
您可以在SET子句中的右侧使用子查询。返回要分配给列的值的子查询可能仅是标量子查询。此外,在这个子查询中您不能使用load data语句正在操作的表
SET子句不会处理IGNORE子句忽略的行。
用固定行格式加载数据时,不能使用用户变量,因为用户变量值之间没有显示宽度
如果输入行的字段太多(多过表中的字段数量),则会忽略额外的字段,并增加警告数。如果输入行的字段太少,那么输入字段缺少的表列被设置为其默认值,在解析文本文件时,空串字段数据与缺少字段值不同(空串会直接作为数据插入,而缺少字段时,会根据字段定义的默认值进行填充),如下:
对于字符串类型,列设置为空字符串
对于数字类型,列设置为0
对于日期和时间类型,列将该类型设置为适当的“零”值
使用mysqldump批量导出生成表的txt文件,并使用mysqlimport批量导入表的txt文件到数据库中
2.1. 使用mysqldump批量导出
使用mysqldump导出数据为文本的语法如下:
mysqldump -u username -p'xxx' -T target_dir db_name tb_name [option];
其中option参数是以下几种可选参数:
--fields-terminated-by 'string' 字段分隔符
--fields-enclosed-by 'char' 字段引用符
--fields-optionally-enclosed-by 'char' 字段引用符,只在char,varchar,text等字段类型上生效
--fields-escaped-by 'char' 转义字符
--lines-terminated-by 'string' 记录结束符,即换行符
示例
$ mkdir /data/backup/
$ chown mysql.mysql /data/backup -R
$ mysqldump -uadmin -pletsg0 -h 10.10.30.241 --single-transaction --master-data=2 --triggers --routines --events xiaoboluo -T /data/backup/
mysqldump: [Warning] Using a password on the command line interface can be insecure.
Warning: A partial dump from a server that has GTIDs will by default include the GTIDs of all transactions, even those that changed suppressed parts of the database.\
If you don't want to restore GTIDs, pass --set-gtid-purged=OFF. To make a complete dump, pass --all-databases --triggers --routines --events.
SET @MYSQLDUMP_TEMP_LOG_BIN = @@SESSION.SQL_LOG_BIN;
SET @@SESSION.SQL_LOG_BIN= 0;
--
-- GTID state at the beginning of the backup
--
SET @@GLOBAL.GTID_PURGED='2016f827-2d98-11e7-bb1e-00163e407cfb:1-114';
--
-- Position to start replication or point-in-time recovery from
--
-- CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000005', MASTER_LOG_POS=21737; #使用-T选项时,--master-data=2打印的binlog pos信息会直接打印在标准输出上
--
-- Dumping events for database 'xiaoboluo'
--
--
-- Dumping routines for database 'xiaoboluo'
--
SET @@SESSION.SQL_LOG_BIN = @MYSQLDUMP_TEMP_LOG_BIN;
$ ll /data/backup/ #可以看到mysqldump -T选项批量导出的表数据除了文本文件之外,还同时备份了表结构
total 32
-rw-r--r-- 1 root root 1526 May 3 22:45 test2.sql
-rw-rw-rw- 1 mysql mysql 286 May 3 22:45 test2.txt
-rw-r--r-- 1 root root 1549 May 3 22:45 test3.sql
-rw-rw-rw- 1 mysql mysql 194 May 3 22:45 test3.txt
-rw-r--r-- 1 root root 1600 May 3 22:45 test4.sql
-rw-rw-rw- 1 mysql mysql 314 May 3 22:45 test4.txt
-rw-r--r-- 1 mysql mysql 1493 May 3 22:45 test.sql
-rw-rw-rw- 1 mysql mysql 95 May 3 22:45 test.txt
2.2. 使用mysqimport批量导出
2.2.1. mysqlimport语法及其参数说明
mysqlimport实用程序加载数据文件时,它通过向服务器发送LOAD DATA INFILE语句来实现(它实际是客户端提供了load data infile语句的一个命令行接口),可以使用--local选项使mysqlimport从客户端主机(而不是mysql server主机)读取数据文件。如果客户端和服务器支持压缩协议,则可以指定--compress选项以在慢速网络中获得更好的性能。
使用mysqlimport命令,语法如下:
mysqlimport -uroot -p 'xxx' [--local] db_name order_tab.txt [iption]
其中,option参数可以是如下选项
--fields-terminated-by=name 指定字段分隔符
--fields-enclosed-by=name 指定字段引用符
--fields-optionally-enclosed-by=name 指定字段引用符,但只在char、varchar、text字段上使用引用符
--fields-escaped-by=name 指定转义字符
--lines-terminated-by=name 指定行记录结束符(换行符)
--ignore-liens=number 忽略前几行
--low-priority 碰到有其他线程update操作操作的表与导入操作表相同时,延迟执行导入操作
-i, --ignore 如果碰到唯一键冲突就忽略冲突行导入
-r, --replace 如果碰到唯一键冲突就覆盖冲突行导入
-L, --local 从客户端主机加载数据文本文件
-C, --compress 在C/S模型之间使用压缩传输数据
-c, --columns=name 指定需要导入哪些列,与load data语句中一样需要指定表定义中真实的列名,有多个列名时使用逗号分隔
--default-character-set=name 设置使用该选项指定的字符集来解析文本文件中的内容
-h, --host 指定导入server的主机IP
-p, --password[=name] 指定导入server的用户密码
-P, --port=# 指定导入server的监听端口
--use-threads=# 指定多少个线程并发执行load data语句(实测单表时指定多线程时要比单线程要快,由于数据量小,测试出来的差别并不大,官方并没有说明是基于什么级别的并发,\
只写了一句:Load files in parallel using N threads,推测可能是基于类似mydumper的并发,但是多表导入时指定多线程就明显比单线程要快很多)
-u, --user=name 指定导入server的用户名
-d, --delete 指定导入操作之前先把表清空(实测重复导入时加了这个选项之后可以正常执行,,通过解析binlog发现,发现binlog中记录的第二次和第一次导入的语句完全相同是,\
第二次导入时如果发现表中有冲突数据,就先执行的不带where条件的delete,所有表先delete掉,然后再执行load data语句导入数据,另外,当与replace一起使用时,忽略replace选项)
2.2.2. mysqlimport用法演示示例
先执行清理server中表的数据
admin@localhost : (none) 11:08:58> use xiaoboluo
Database changed
admin@localhost : xiaoboluo 11:09:00> show tables;
+---------------------+
| Tables_in_xiaoboluo |
+---------------------+
| test |
| test2 |
| test3 |
| test4 |
+---------------------+
4 rows in set (0.00 sec)
admin@localhost : xiaoboluo 11:09:01> select * from test;
+----+------+-------+
| id | test | test2 |
+----+------+-------+
| 2 | 1 | 2 |
| 4 | 2 | NULL |
| 6 | null | NULL |
| 8 | 4 | NULL |
| 10 | | NULL |
| 12 | \\t | NULL |
| 14 | t | NULL |
| 16 | \t | NULL |
| 18 | t | NULL |
| 20 | NULL | NULL |
| 22 | "t | NULL |
+----+------+-------+
11 rows in set (0.00 sec)
admin@localhost : xiaoboluo 11:09:11> system cat /data/backup/test.txt;
2 1 2
4 2 \N
6 null \N
8 4 \N
10 \ \N
12 \\\\t \N
14 t \N
16 \\t \N
18 t \N
20 \N \N
22 "t \N
admin@localhost : xiaoboluo 11:12:08> select * from test2;
+----+------+-------+---------------------+
| id | test | test2 | dt |
+----+------+-------+---------------------+
| 2 | 1 | 2 | 2017-05-02 18:47:03 |
| 4 | 2 | NULL | 2017-05-02 18:47:03 |
| 6 | null | NULL | 2017-05-02 18:47:03 |
| 8 | 4 | NULL | 2017-05-02 18:47:03 |
| 10 | | NULL | 2017-05-02 18:47:03 |
| 12 | \\t | NULL | 2017-05-02 18:47:03 |
| 14 | t | NULL | 2017-05-02 18:47:03 |
| 16 | \t | NULL | 2017-05-02 18:47:03 |
| 18 | t | NULL | 2017-05-02 18:47:03 |
| 20 | NULL | NULL | 2017-05-02 18:47:03 |
+----+------+-------+---------------------+
10 rows in set (0.00 sec)
admin@localhost : xiaoboluo 11:12:15> system cat /data/backup/test2.txt;
2 1 2 2017-05-02 18:47:03
4 2 \N 2017-05-02 18:47:03
6 null \N 2017-05-02 18:47:03
8 4 \N 2017-05-02 18:47:03
10 \ \N 2017-05-02 18:47:03
12 \\\\t \N 2017-05-02 18:47:03
14 t \N 2017-05-02 18:47:03
16 \\t \N 2017-05-02 18:47:03
18 t \N 2017-05-02 18:47:03
20 \N \N 2017-05-02 18:47:03
admin@localhost : xiaoboluo 11:12:27> truncate test2;
Query OK, 0 rows affected (0.00 sec)
admin@localhost : xiaoboluo 11:12:32> select * from test3;
+----+------------------------------------------+--------+-------+
| id | test | test2 | test3 |
+----+------------------------------------------+--------+-------+
| 2 | a string | 100.20 | null |
| 4 | a string containing a , comma | 102.20 | NULL |
| 6 | a string containing a " quote | 102.20 | NULL |
| 8 | a string containing a ", quote and comma | 102.20 | NULL |
| 10 | \t | 102.20 | NULL |
| 14 | \t | 102.20 | NULL |
+----+------------------------------------------+--------+-------+
6 rows in set (0.00 sec)
admin@localhost : xiaoboluo 11:12:44> system cat /data/backup/test3.txt;
2 a string 100.20 null
4 a string containing a , comma 102.20 \N
6 a string containing a " quote 102.20 \N
8 a string containing a ", quote and comma 102.20 \N
10 \\t 102.20 \N
14 \\t 102.20 \N
admin@localhost : xiaoboluo 11:12:59> truncate test3;
Query OK, 0 rows affected (0.01 sec)
admin@localhost : xiaoboluo 11:13:03> select * from test4;
+----+------------------------------------------+--------+-------+---------------------+
| id | test | test2 | test3 | test4 |
+----+------------------------------------------+--------+-------+---------------------+
| 2 | a string | 100.20 | null | 2017-05-03 18:41:02 |
| 4 | a string containing a , comma | 102.20 | NULL | 2017-05-03 18:41:02 |
| 6 | a string containing a " quote | 102.20 | NULL | 2017-05-03 18:41:02 |
| 8 | a string containing a ", quote and comma | 102.20 | NULL | 2017-05-03 18:41:02 |
| 10 | \t | 102.20 | NULL | 2017-05-03 18:41:02 |
| 14 | \t | 102.20 | NULL | 2017-05-03 18:41:02 |
+----+------------------------------------------+--------+-------+---------------------+
6 rows in set (0.00 sec)
admin@localhost : xiaoboluo 11:13:15> system cat /data/backup/test4.txt;
2 a string 100.20 null 2017-05-03 18:41:02
4 a string containing a , comma 102.20 \N 2017-05-03 18:41:02
6 a string containing a " quote 102.20 \N 2017-05-03 18:41:02
8 a string containing a ", quote and comma 102.20 \N 2017-05-03 18:41:02
10 \\t 102.20 \N 2017-05-03 18:41:02
14 \\t 102.20 \N 2017-05-03 18:41:02
admin@localhost : xiaoboluo 11:13:24> truncate test4;
Query OK, 0 rows affected (0.01 sec)
admin@localhost : xiaoboluo 11:13:28> flush logs;
Query OK, 0 rows affected (0.01 sec)
2.2.2.1. 单表导入
使用mysqlimport命令导入单张表
[root@5f1772e3-0c7a-4537-97f9-9b57cf6a04c2 ~]# mysqlimport -uadmin -pletsg0 -h20.10.30.241 xiaoboluo /data/backup/test.txt
mysqlimport: [Warning] Using a password on the command line interface can be insecure.
xiaoboluo.test: Records: 11 Deleted: 0 Skipped: 0 Warnings: 0
# 查看数据库中的数据
admin@localhost : xiaoboluo 11:13:42> select * from test;
+----+------+-------+
| id | test | test2 |
+----+------+-------+
| 2 | 1 | 2 |
| 4 | 2 | NULL |
| 6 | null | NULL |
| 8 | 4 | NULL |
| 10 | | NULL |
| 12 | \\t | NULL |
| 14 | t | NULL |
| 16 | \t | NULL |
| 18 | t | NULL |
| 20 | NULL | NULL |
| 22 | "t | NULL |
+----+------+-------+
11 rows in set (0.00 sec)
解析binlog查看里边如何记录的
$ mysqlbinlog -vv --base64-output=decode-rows mysql-bin.000006
.....
BEGIN
/*!*/;
# at 344
#170503 23:15:29 server id 3306241 end_log_pos 443 CRC32 0x4c1c8e8a Rows_query
# LOAD DATA INFILE '/data/backup/test.txt' INTO TABLE `test` IGNORE 0 LINES #mysqlimport内部调用的load data语句在这里
# at 443
#170503 23:15:29 server id 3306241 end_log_pos 501 CRC32 0x1ddc6d53 Table_map: `xiaoboluo`.`test` mapped to number 304
# at 501
#170503 23:15:29 server id 3306241 end_log_pos 631 CRC32 0xa8c4beab Write_rows: table id 304 flags: STMT_END_F
### INSERT INTO `xiaoboluo`.`test` #由于binlog_format=row,所以写到binlog中时内部把load data语句转换为了row格式
### SET
### @1=2 /* INT meta=0 nullable=0 is_null=0 */
### @2='1' /* VARSTRING(300) meta=300 nullable=1 is_null=0 */
### @3='2' /* VARSTRING(300) meta=300 nullable=1 is_null=0 */
### INSERT INTO `xiaoboluo`.`test`
### SET
### @1=4 /* INT meta=0 nullable=0 is_null=0 */
### @2='2' /* VARSTRING(300) meta=300 nullable=1 is_null=0 */
### @3=NULL /* VARSTRING(300) meta=300 nullable=1 is_null=1 */
### INSERT INTO `xiaoboluo`.`test`
### SET
### @1=6 /* INT meta=0 nullable=0 is_null=0 */
### @2='null' /* VARSTRING(300) meta=300 nullable=1 is_null=0 */
### @3=NULL /* VARSTRING(300) meta=300 nullable=1 is_null=1 */
### INSERT INTO `xiaoboluo`.`test`
### SET
### @1=8 /* INT meta=0 nullable=0 is_null=0 */
### @2='4' /* VARSTRING(300) meta=300 nullable=1 is_null=0 */
### @3=NULL /* VARSTRING(300) meta=300 nullable=1 is_null=1 */
### INSERT INTO `xiaoboluo`.`test`
### SET
### @1=10 /* INT meta=0 nullable=0 is_null=0 */
### @2='\x09' /* VARSTRING(300) meta=300 nullable=1 is_null=0 */
### @3=NULL /* VARSTRING(300) meta=300 nullable=1 is_null=1 */
### INSERT INTO `xiaoboluo`.`test`
### SET
### @1=12 /* INT meta=0 nullable=0 is_null=0 */
### @2='\x5c\x5ct' /* VARSTRING(300) meta=300 nullable=1 is_null=0 */
### @3=NULL /* VARSTRING(300) meta=300 nullable=1 is_null=1 */
### INSERT INTO `xiaoboluo`.`test`
### SET
### @1=14 /* INT meta=0 nullable=0 is_null=0 */
### @2='t' /* VARSTRING(300) meta=300 nullable=1 is_null=0 */
### @3=NULL /* VARSTRING(300) meta=300 nullable=1 is_null=1 */
### INSERT INTO `xiaoboluo`.`test`
### SET
### @1=16 /* INT meta=0 nullable=0 is_null=0 */
### @2='\x5ct' /* VARSTRING(300) meta=300 nullable=1 is_null=0 */
### @3=NULL /* VARSTRING(300) meta=300 nullable=1 is_null=1 */
### INSERT INTO `xiaoboluo`.`test`
### SET
### @1=18 /* INT meta=0 nullable=0 is_null=0 */
### @2='t' /* VARSTRING(300) meta=300 nullable=1 is_null=0 */
### @3=NULL /* VARSTRING(300) meta=300 nullable=1 is_null=1 */
### INSERT INTO `xiaoboluo`.`test`
### SET
### @1=20 /* INT meta=0 nullable=0 is_null=0 */
### @2=NULL /* VARSTRING(300) meta=300 nullable=1 is_null=1 */
### @3=NULL /* VARSTRING(300) meta=300 nullable=1 is_null=1 */
### INSERT INTO `xiaoboluo`.`test`
### SET
### @1=22 /* INT meta=0 nullable=0 is_null=0 */
### @2='"t' /* VARSTRING(300) meta=300 nullable=1 is_null=0 */
### @3=NULL /* VARSTRING(300) meta=300 nullable=1 is_null=1 */
# at 631
#170503 23:15:29 server id 3306241 end_log_pos 662 CRC32 0x0cd1a6ae Xid = 756
COMMIT/*!*/;
......
2.2.2.2.多表导入
清理掉test表,并刷新一下binlog
admin@localhost : xiaoboluo 11:32:19> truncate test;
Query OK, 0 rows affected (0.01 sec)
admin@localhost : xiaoboluo 11:35:09> flush logs;
Query OK, 0 rows affected (0.01 sec)
使用mysqlimport导入多表
[root@5f1772e3-0c7a-4537-97f9-9b57cf6a04c2 binlog]# time mysqlimport -uadmin -pletsg0 -h20.10.30.241 --replace xiaoboluo /data/backup/*.txt
mysqlimport: [Warning] Using a password on the command line interface can be insecure.
xiaoboluo.test2: Records: 10 Deleted: 0 Skipped: 0 Warnings: 0
xiaoboluo.test3: Records: 6 Deleted: 0 Skipped: 0 Warnings: 0
xiaoboluo.test4: Records: 6 Deleted: 0 Skipped: 0 Warnings: 0
xiaoboluo.test: Records: 11 Deleted: 0 Skipped: 0 Warnings: 0
real 0m0.014s
user 0m0.002s
sys 0m0.002s
# 多表导入时可以使用参数--use-threads指定多个线程,明显比单线程导入速度要快
[root@5f1772e3-0c7a-4537-97f9-9b57cf6a04c2 binlog]# time mysqlimport -uadmin -pletsg0 -h20.10.30.241 --replace --use-threads=8 xiaoboluo /data/backup/*.txt
mysqlimport: [Warning] Using a password on the command line interface can be insecure.
xiaoboluo.test3: Records: 6 Deleted: 0 Skipped: 0 Warnings: 0
xiaoboluo.test2: Records: 10 Deleted: 0 Skipped: 0 Warnings: 0
xiaoboluo.test: Records: 11 Deleted: 0 Skipped: 0 Warnings: 0
xiaoboluo.test4: Records: 6 Deleted: 0 Skipped: 0 Warnings: 0
real 0m0.007s
user 0m0.006s
sys 0m0.002s
校验数据
admin@localhost : xiaoboluo 11:35:15> select * from test;
+----+------+-------+
| id | test | test2 |
+----+------+-------+
| 2 | 1 | 2 |
| 4 | 2 | NULL |
| 6 | null | NULL |
| 8 | 4 | NULL |
| 10 | | NULL |
| 12 | \\t | NULL |
| 14 | t | NULL |
| 16 | \t | NULL |
| 18 | t | NULL |
| 20 | NULL | NULL |
| 22 | "t | NULL |
+----+------+-------+
11 rows in set (0.00 sec)
admin@localhost : xiaoboluo 11:40:31> select * from test2;
+----+------+-------+---------------------+
| id | test | test2 | dt |
+----+------+-------+---------------------+
| 2 | 1 | 2 | 2017-05-02 18:47:03 |
| 4 | 2 | NULL | 2017-05-02 18:47:03 |
| 6 | null | NULL | 2017-05-02 18:47:03 |
| 8 | 4 | NULL | 2017-05-02 18:47:03 |
| 10 | | NULL | 2017-05-02 18:47:03 |
| 12 | \\t | NULL | 2017-05-02 18:47:03 |
| 14 | t | NULL | 2017-05-02 18:47:03 |
| 16 | \t | NULL | 2017-05-02 18:47:03 |
| 18 | t | NULL | 2017-05-02 18:47:03 |
| 20 | NULL | NULL | 2017-05-02 18:47:03 |
+----+------+-------+---------------------+
10 rows in set (0.00 sec)
admin@localhost : xiaoboluo 11:40:33> select * from test3;
+----+------------------------------------------+--------+-------+
| id | test | test2 | test3 |
+----+------------------------------------------+--------+-------+
| 2 | a string | 100.20 | null |
| 4 | a string containing a , comma | 102.20 | NULL |
| 6 | a string containing a " quote | 102.20 | NULL |
| 8 | a string containing a ", quote and comma | 102.20 | NULL |
| 10 | \t | 102.20 | NULL |
| 14 | \t | 102.20 | NULL |
+----+------------------------------------------+--------+-------+
6 rows in set (0.00 sec)
admin@localhost : xiaoboluo 11:40:34> select * from test4;
+----+------------------------------------------+--------+-------+---------------------+
| id | test | test2 | test3 | test4 |
+----+------------------------------------------+--------+-------+---------------------+
| 2 | a string | 100.20 | null | 2017-05-03 18:41:02 |
| 4 | a string containing a , comma | 102.20 | NULL | 2017-05-03 18:41:02 |
| 6 | a string containing a " quote | 102.20 | NULL | 2017-05-03 18:41:02 |
| 8 | a string containing a ", quote and comma | 102.20 | NULL | 2017-05-03 18:41:02 |
| 10 | \t | 102.20 | NULL | 2017-05-03 18:41:02 |
| 14 | \t | 102.20 | NULL | 2017-05-03 18:41:02 |
+----+------------------------------------------+--------+-------+---------------------+
6 rows in set (0.00 sec)
解析binlog查看(由于内容较多,这里就不贴出来了,binlog解析的多表导入操作在binlog中记录的就是一个表一个load data语句)
关于使用local子句与不使用local子句的时候的差异
如果load data语句使用了local子句,则客户端使用TCP远程连接mysql server时,没有file权限仍然能够导入文本文件,这个时候是非常危险的,因为local子句的内部原理是从客户端的主机读取文本文件并传送到server端的/tmp目录并保存为一个临时文件,再执行load data语句的。另外,要使用local子句,还需要看server端启动是否关闭了local_infile选项(如果不指定该选项,则服务端默认为ON),mysql client连接时是否关闭了local_infile选项(如果不指定该选项,则客户端默认为ON),local_infile在server或client端任意一端关闭都不能使用local子句,会报错误:ERROR 1148 (42000): The used command is not allowed with this MySQL version
如果load data语句不使用local子句,则这个时候用户必须要有file权限才能够执行导入文本文件(并且只能够导入server端的本地文本文件),如果没有file权限,可能报没有file权限的错误,也可能报错:ERROR 1045 (28000): Access denied for user 'test'@'%' (using password: YES)
如果不想这么麻烦(因为要限制客户端使用local子句在没有file权限的时候使用load data语句,需要在server端使用local_infile=OFF来关闭,不使用local子句时,如果用户没有file权限,那很显然不能够使用load data语句,但是如果还想限制由具有file权限的用户怎么办?),可以使用参数secure_file_priv=null,设置为null时,全面禁止使用load data语句(不管使用local子句还是不使用都不允许执行load data语句)
强调一点:在mysql的主备复制架构中,load data语句被认为是不安全的,要使得load data语句安全地进行复制,在binlog_format=mixed格式下会转为row格式记录,在binlog_format=statement时执行load data语句不会发出警告,而是内部通过一些列的流程来处理。具体是如何处理的呢,请看下回分解《load data语句如何保证主备复制数据一致性》
免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。