mysql 关于大数据量日志表的优化过程
我们知道很多网站都会有关于记录网站搜索的日志表,用于记录会员的基本登录信息,用于后期数据分析或者防攻击使用,现在我们有一张表logs,每天产生大概60万的数据。
环境介绍:10.0.21-MariaDB-log
我们的业务主要分成三块:
1)查询当天的logs;
2)查看最近2个月的logs;
3)查看最近一年的logs.
我们当前的表是不管点那个按钮查询的都是这个表logs,这个表的数据一直存在,直到收到反馈报错504 timeout,然后才去处理这个表,就是删除一年之前的数据,俩月就得处理一次,很蛋疼,于是想着优化下。
整体的思路就是:分表和分区表的结合。
1)查询当天和近两个月的数据的业务去查询logs;
2)查看最近一年的数据的业务去查询另一个表logs_oneyear,
3)然后利用存储过程+event的方式定时去处理logs和logs_oneyear,使这两个表里始终只保留最近2个月以及近10个月的数据,这样保证了可以查询到一年数据的目的。
具体处理过程如下:
一:首先处理保存2个月的表logs,打算按时间分区,7天一个分区,利用存储过程logs_twomonths_procedure删除一个老分区,创建一个新分区,并且把删除的老分区的数据,插入到logs_oneyear,然后做一个event每7天定时执行,
1)创建表logs的语句如下,这里需要注意的是针对timestamp类型的时间属性需要借助函数unix_timestamp才能创建分区表,但是DATETIME 类型不需要。
CREATE TABLE `logs` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`member_id` int(11) DEFAULT NULL,
`jsession` int(11) DEFAULT NULL,
`ip` bigint(20) DEFAULT NULL,
`shijian` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`status` int(11) NOT NULL DEFAULT '0' COMMENT 'When the page(html) is open ,this attribute will set 1',
PRIMARY KEY (`id`,`shijian`),
KEY `logs_member_idx` (`member_id`,`jsession`,`shijian`),
KEY `logs_ip_idx` (`ip`),
KEY `logs_shijian_idx` (`shijian`)
) ENGINE=MyISAM AUTO_INCREMENT=847368831411249480 DEFAULT CHARSET=utf8
PARTITION BY RANGE(unix_timestamp(shijian))
(
PARTITION p20170201 VALUES LESS THAN (unix_timestamp('2017-02-01 00:00:00')) ENGINE = MyISAM,
PARTITION p20170207 VALUES LESS THAN (unix_timestamp('2017-02-07 00:00:00')) ENGINE = MyISAM,
PARTITION p20170214 VALUES LESS THAN (unix_timestamp('2017-02-14 00:00:00')) ENGINE = MyISAM,
PARTITION p20170221 VALUES LESS THAN (unix_timestamp('2017-02-21 00:00:00')) ENGINE = MyISAM,
PARTITION p20170228 VALUES LESS THAN (unix_timestamp('2017-02-28 00:00:00')) ENGINE = MyISAM,
PARTITION p20170307 VALUES LESS THAN (unix_timestamp('2017-03-07 00:00:00')) ENGINE = MyISAM,
PARTITION p20170314 VALUES LESS THAN (unix_timestamp('2017-03-14 00:00:00')) ENGINE = MyISAM,
PARTITION p20170321 VALUES LESS THAN (unix_timestamp('2017-03-21 00:00:00')) ENGINE = MyISAM,
PARTITION p20170328 VALUES LESS THAN (unix_timestamp('2017-03-28 00:00:00')) ENGINE = MyISAM,
PARTITION p20170404 VALUES LESS THAN (unix_timestamp('2017-04-04 00:00:00')) ENGINE = MyISAM,
PARTITION pmax VALUES LESS THAN (MAXVALUE) ENGINE = MyISAM);
2)如下创建存储过程logs_twomonths_procedure的语句:
MariaDB [log]> drop procedure if exists logs_twomonths_procedure;
MariaDB [log]>
DELIMITER $$
create procedure logs_twomonths_procedure()
begin
/* 事务回滚*/
declare exit handler for sqlexception rollback;
start TRANSACTION;
/* 到系统表查出这个表的最大分区,得到最大分区的日期。在创建分区的时候,名称就以日期格式存放,方便后面维护,p12_name是自己定义变量,注意需要条件partition_name!='pmax'*/
select REPLACE(partition_name,'p','') into @P12_Name from INFORMATION_SCHEMA.PARTITIONS where TABLE_SCHEMA='log' and table_name='logs' and partition_name!='pmax' order by partition_ordinal_position DESC limit 1;
/*直接加7天,就是得到7天之后的日期。 +0 是为了把日期都格式化成YYYYMMDD这样的格式*/
set @Max_date= date(DATE_ADD(@P12_Name+0, INTERVAL 7 DAY))+0;
/* 修改表,在最大分区的后面增加一个分区,时间范围添加7天 ,DATE函数是把20110101改成日期格式2011-01-01,语句中两个单引号代表一个单引号 ,repare预处理固定语法 */
SET @s1=concat('ALTER TABLE logs_oneyear ADD PARTITION (PARTITION p',@Max_date,' VALUES LESS THAN (unix_timestamp(''',date(@Max_date),''')))');
PREPARE hezi FROM @s1;
EXECUTE hezi ;
DEALLOCATE PREPARE hezi ;
/* 将最小分区中的数据插入到logs_oneyear表中,因为接下来要删除这个最小分区会同时删除分区内的数据,慎重 */
select REPLACE(partition_name,'p','') into @min_date from INFORMATION_SCHEMA.PARTITIONS where TABLE_SCHEMA='log' and table_name='logs' order by partition_ordinal_position asc limit 1;
SET @s2=concat('insert into logs_oneyear select * from logs where shijian<''',date(@min_date),'''');
PREPARE hezi1 FROM @s2;
EXECUTE hezi1 ;
DEALLOCATE PREPARE hezi1 ;
/* 取出最小的分区的名称,并删除掉 */
select partition_name into @P0_Name from INFORMATION_SCHEMA.PARTITIONS where TABLE_SCHEMA='log' and table_name='logs' order by partition_ordinal_position limit 1;
SET @s3=concat('ALTER TABLE logs DROP PARTITION ',@P0_Name);
PREPARE hezi3 FROM @s3;
EXECUTE hezi3 ;
DEALLOCATE PREPARE hezi3;
/* 提交 */
COMMIT ;
end;
$$
3)创建event,7天执行一次,应该在log库下创建,
mysql的库相当于oracle的schema,
MariaDB [log]> CREATE EVENT logs_Partition7
ON SCHEDULE
EVERY 7 day STARTS '2017-03-31 08:00:00'
DO
call logs_twomonths_procedure();
二:处理报错10个月数据的那个表logs_oneyear.
1)创建logs_oneyear表的语句,半月一个分区
MariaDB [log]>drop table if exists logs_oneyear;
MariaDB [log]>CREATE TABLE `logs_oneyear` (
`disc` int(11) NOT NULL,
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`member_id` int(11) DEFAULT NULL,
`jsession` int(11) DEFAULT NULL,
`ip` bigint(20) DEFAULT NULL,
`keyword` varchar(255) DEFAULT NULL,
`page` int(11) DEFAULT NULL,
`rp` int(11) DEFAULT NULL,
`area` varchar(255) DEFAULT NULL,
`category` varchar(255) DEFAULT NULL,
`fenlei` int(11) DEFAULT NULL,
`shijianduan` int(11) DEFAULT NULL,
`type` int(11) DEFAULT NULL,
`agency_company` varchar(55) DEFAULT NULL,
`biaoxun_type` int(11) DEFAULT NULL,
`caigou_type` int(11) DEFAULT NULL,
`shijian` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`total` int(11) DEFAULT NULL,
`xmxx_jieduan` int(11) DEFAULT NULL,
`zhanguan` varchar(255) DEFAULT NULL,
`main_product` varchar(255) DEFAULT NULL,
`fazx_fenlei1` int(11) DEFAULT NULL,
`fazx_fenlei2` int(11) DEFAULT NULL,
`jiaoyileixing` int(11) DEFAULT NULL,
`mailid` bigint(20) DEFAULT NULL,
`referer` int(11) DEFAULT NULL,
`search_id` bigint(20) DEFAULT NULL,
`info_id` bigint(20) DEFAULT NULL,
`status` int(11) NOT NULL DEFAULT '0' COMMENT 'When the page(html) is open ,this attribute will set 1',
PRIMARY KEY (`id`,`shijian`),
KEY `logs_disc_idx` (`disc`),
KEY `logs_member_idx` (`member_id`,`jsession`,`shijian`),
KEY `logs_ip_idx` (`ip`),
KEY `logs_shijian_idx` (`shijian`),
KEY `logs_mail_idx` (`mailid`)
) ENGINE=MyISAM AUTO_INCREMENT=847368831411249480 DEFAULT CHARSET=utf8
PARTITION BY RANGE(unix_timestamp(shijian))
(
PARTITION p20160501 VALUES LESS THAN (unix_timestamp('2016-05-01 00:00:00')) ENGINE = MyISAM,
PARTITION p20160515 VALUES LESS THAN (unix_timestamp('2016-05-15 00:00:00')) ENGINE = MyISAM,
PARTITION p20160601 VALUES LESS THAN (unix_timestamp('2016-06-01 00:00:00')) ENGINE = MyISAM,
PARTITION p20160615 VALUES LESS THAN (unix_timestamp('2016-06-15 00:00:00')) ENGINE = MyISAM,
PARTITION p20160701 VALUES LESS THAN (unix_timestamp('2016-07-01 00:00:00')) ENGINE = MyISAM,
PARTITION p20160715 VALUES LESS THAN (unix_timestamp('2016-07-15 00:00:00')) ENGINE = MyISAM,
PARTITION p20160801 VALUES LESS THAN (unix_timestamp('2016-08-01 00:00:00')) ENGINE = MyISAM,
PARTITION p20160815 VALUES LESS THAN (unix_timestamp('2016-08-15 00:00:00')) ENGINE = MyISAM,
PARTITION p20160901 VALUES LESS THAN (unix_timestamp('2016-09-01 00:00:00')) ENGINE = MyISAM,
PARTITION p20160915 VALUES LESS THAN (unix_timestamp('2016-09-15 00:00:00')) ENGINE = MyISAM,
PARTITION p20161001 VALUES LESS THAN (unix_timestamp('2016-10-01 00:00:00')) ENGINE = MyISAM,
PARTITION p20161015 VALUES LESS THAN (unix_timestamp('2016-10-15 00:00:00')) ENGINE = MyISAM,
PARTITION p20161101 VALUES LESS THAN (unix_timestamp('2016-11-01 00:00:00')) ENGINE = MyISAM ,
PARTITION p20161115 VALUES LESS THAN (unix_timestamp('2016-11-15 00:00:00')) ENGINE = MyISAM,
PARTITION p20161201 VALUES LESS THAN (unix_timestamp('2016-12-01 00:00:00')) ENGINE = MyISAM,
PARTITION p20161215 VALUES LESS THAN (unix_timestamp('2016-12-15 00:00:00')) ENGINE = MyISAM,
PARTITION p20170101 VALUES LESS THAN (unix_timestamp('2017-01-01 00:00:00')) ENGINE = MyISAM,
PARTITION p20170115 VALUES LESS THAN (unix_timestamp('2017-01-15 00:00:00')) ENGINE = MyISAM,
PARTITION p20170201 VALUES LESS THAN (unix_timestamp('2017-02-01 00:00:00'))
ENGINE = MyISAM,
PARTITION p20170215 VALUES LESS THAN (unix_timestamp('2017-02-15 00:00:00'))
ENGINE = MyISAM,
PARTITION pmax1 VALUES LESS THAN (MAXVALUE) ENGINE = MyISAM));
);
2)创建存储过程logs_oneyear_procedure,
MariaDB [log]>drop procedure if exists logs_oneyear_procedure;
DELIMITER $$
MariaDB [log]>create procedure logs_oneyear_procedure()
begin
declare exit handler for sqlexception rollback;
start TRANSACTION;
select REPLACE(partition_name,'p','') into @liu_Name from INFORMATION_SCHEMA.PARTITIONS where TABLE_SCHEMA='log' and table_name='logs_oneyear' and partition_name!='pmax1' order by partition_ordinal_position DESC limit 1;
/* 判断最大分区的时间段,如果是15号,那么根据情况需要加,14,15,16,17天,,如果是1号就直接加14即可,这样的目的就是保证产生的分区间隔是每个月的15号和下个月的的1号,说白了 就是为了规范*/
IF (DAY(@P12_Name)=15) THEN
CASE day(LAST_DAY(@P12_name))
WHEN 31 THEN set @Max_date= date(DATE_ADD(@P12_Name+0,INTERVAL 17 DAY))+0 ;
WHEN 30 THEN set @Max_date= date(DATE_ADD(@P12_Name+0,INTERVAL 16 DAY))+0 ;
WHEN 29 THEN set @Max_date= date(DATE_ADD(@P12_Name+0,INTERVAL 15 DAY))+0 ;
WHEN 28 THEN set @Max_date= date(DATE_ADD(@P12_Name+0,INTERVAL 14 DAY))+0 ;
END CASE;
ELSE
set @Max_date= date(DATE_ADD(@P12_Name+0, INTERVAL 14 DAY))+0;
END IF;
SET @s1=concat('ALTER TABLE logs_oneyear ADD PARTITION (PARTITION p',@Max_date,' VALUES LESS THAN (unix_timestamp(''',date(@Max_date),''')))');
PREPARE stmt2 FROM @s1;
EXECUTE stmt2;
DEALLOCATE PREPARE stmt2;
/* 将最小分区中的数据插入到历史库下的备份表logs_back_start20160501中,因为接下来要删除这个最小分区会同时删除分区内的数据,慎重 */
select REPLACE(partition_name,'p','') into @min_date from INFORMATION_SCHEMA.PARTITIONS where TABLE_SCHEMA='log' and table_name='logs_oneyear' order by partition_ordinal_position asc limit 1;
SET @s2=concat('insert into history.logs_back_start20160501 select * from logs_oneyear where shijian<''',date(@min_date),'''');
PREPARE stmt3 FROM @s2;
EXECUTE stmt3;
DEALLOCATE PREPARE stmt3;
select partition_name into @P0_Name from INFORMATION_SCHEMA.PARTITIONS where TABLE_SCHEMA='log' and table_name='logs_oneyear' order by partition_ordinal_position limit 1;
SET @s=concat('ALTER TABLE logs_oneyear DROP PARTITION ',@P0_Name);
PREPARE stmt1 FROM @s;
EXECUTE stmt1;
DEALLOCATE PREPARE stmt1;
COMMIT ;
end;
$$
3)创建event,每隔15天执行一次。
计划任务代码:
CREATE EVENT logs_Partition_oneyear
ON SCHEDULE
EVERY 15 day STARTS '2017-04-01 22:30:00'
DO
call logs_oneyear_procedure();
最后把相应的数据insert到logs和logs_oneyear即可,至此处理结束,观察一段时间,性能果然有所提升,需要注意的是,如果查询的语句中where条件中没有使用shijian字段,那么表做分区性能提升不大,还影响了insert的效率,所以具体该不该分区,或者应该使用什么字段分区,需要自己去探究。
引申内容:
1)关于时间字段datetime和timestamp的异同点,如果可以尽量使用TIMESTAMP:
相同
显示
TIMESTAMP列的显示格式与DATETIME列相同。换句话说,显示宽度固定在19字符,并且格式为YYYY-MM-DD HH:MM:SS。
不同
范围
datetime 以'YYYY-MM-DD HH:MM:SS'格式检索和显示DATETIME值。支持的范围为'1000-01-01 00:00:00'到'9999-12-31 23:59:59'TIMESTAMP值不能早于1970或晚于2037
储存
TIMESTAMP
1.4个字节储存(Time stamp value is stored in 4 bytes)
2.值以UTC格式保存( it stores the number of milliseconds)
3.时区转化 ,存储时对当前的时区进行转换,检索时再转换回当前的时区。
datetime
1.8个字节储存(8 bytes storage)
2.实际格式储存(Just stores what you have stored and retrieves the same thing which you have stored.)
3.与时区无关(It has nothing to deal with the TIMEZONE and Conversion.)
2)关于mysql预处理语句
其用法十分简单,其中stmt_name 是随便起的名字。
PREPARE stmt_name FROM preparable_stmt
EXECUTE stmt_name
[USING @var_name [, @var_name] ...] -
{DEALLOCATE | DROP} PREPARE stmt_name
使用PAREPARE STATEMENT可以减少每次执行SQL的语法分析,比如用于执行带有WHERE条件的SELECT和DELETE,或者UPDATE,或者INSERT,只需要每次修改变量值即可。同样可以防止SQL注入,参数值可以包含转义符和定界符。适用在应用程序中,或者SQL脚本中均可