Thursday, December 27, 2018

Matlab: WGET下载文件

Wget诞生于1996年,安装过程参考[1],下载地址在[2],注意设置环境变量。安装非常简易,简易教程参考[4]、[5]。

MODIS

MODIS数据确认后,系统自动发送邮件至邮箱,数据将在5天后删除。邮件正文包含订单编号及示例Wget下载语句,如Fig. 1。
Fig. 1
Wget语句,请注意三处,每次下载之时均需确认完整:第一是订单编号;第二是appKey;第三是下载地址,下载过程及appKey请参考说明,如下示例:
1
wget -e robots=off -m -np -R .html,.tmp -nH --cut-dirs=3 "https://ladsweb.modaps.eosdis.nasa.gov/archive/orders/501290927/" --header "Authorization: Bearer 7E334C3A-079E-11E9-A5A4-CDB570C49BBF" -P F:\country.road\MODISDownload\output
如果Wget遇到错误“Disabling SSL due to encountered errors.”,2019年6月2日的解决方法是应用最新版“Wget1.20.3”.

References

Monday, September 17, 2018

Data: NUMBEO

NUMBEO基于互联网用户反馈得到全球各国(部分城市)数据,该数据近些年被Forbes, Business Insider, Time, The Economist, BBC, The New York Times, China Daily, The Telegraph等广泛引用。
主要数据:Cost of Living/Crime/Climate/Food Prices/Gas Prices/Health Care/Pollution/Property Prices/Quality of Life/Taxi Fare/Traffic

Monday, September 10, 2018

Matlab: 经纬坐标分布像元

Introduction

已知一些企业所在坐标,依据这些坐标可以将这些企业落实在具体地像元之上,由此可以创建企业属性专题空间分布图,比如像元值(Digital Numeric)表示企业数量,在该像元内企业的数量;表示注册资本,在该像元内注册资本的数量;表示招聘数量(发布数量、发布岗位数量),在该像元内企业发布数量或岗位数量等。一般,具备坐标的属性均可落实在像元上。

Note

像元代表性,一个像元代表特定的面积,比如一平方公里的像元依据坐标分布5家企业(公司),若在北京,这5家企业(公司)占地范围极有可能不超过像元范围(与像元范围相比,他们非常接近一个个点),因为北京地区多为服务业企业(公司),租用写字楼即是企业经营场所,多数写字楼还是高层建筑,可容纳多家企业,但在一些中西部地区,这样的一平方公里像元就可能需要再斟酌,因为他们多为农业加工、制造业企业,占地若干平方公里的企业也是屡见不鲜,这些地区应仔细考虑像元覆盖范围。

基本步骤

第一步,生成背景栅格。注意填充configuration.xlsx,像元数据类型依据具体情况,参考DataType.txt完成。
第二步,属性分布在像元。输入文件位于input目录,每一CSV文件前两列是名称、地址或其他你认为重要的属性,第三、四列依次是纬度、经度(不可调整),第五列是数值型属性,为数值之和(SUM),若不存在第五列像元数值则表示是每条坐标的数量之和(COUNT)。 输出栅格文件位于output,它与对应CSV文件同名。同时,输出log文件,记录一些不能顺利实现分布的数据属性,输出err文件,格式化整理这些数据的主要属性,以备将来重新批量查询。
Fig. 1
如Fig .1所示,(a)无第五列数值,所以(a)是每条坐标累计之和(COUNT),此处是像元中企业所在数量,若像元数值是10,就表示该像元范围有10家企业;(b)有第五列数值,所以(b)是属性数值累计之和(SUM),若像元数值是10000,就表示该像元范围内属性数值累计为10000,但至于该数值是多少条属性值累计得来,还需参考(a),可能是1条,也可能是多条。

Saturday, September 8, 2018

MAP API: 中文地址重定位经纬坐标

Introduction

在调用百度API之前,通过参考文献[1]获得一个开发秘钥(AK)。如Fig. 1所示,应用类型“浏览器端”,启用服务“全选”,Referer白名单可以先填“*”(此时任何网站均可访问,存在安全风险),待以后情况补充为特定网站(此时仅指定网站可以访问,安全性即可加强)。 参考文献[2]Python代码不能运行,但其结构、流程具有借鉴意义。
Fig. 1
中文地址不能直接加入URL发出申请,先行转换为URL编码。如上地十街10号,转制为URL编码即为“%E4%B8%8A%E5%9C%B0%E5%8D%81%E8%A1%9710%E5%8F%B7”(验证地址),基础是UTF-8编码。

BAIDU MAP API

我们使用百度API,以地址返回的是百度坐标(与真实WGS84坐标之间存在一定偏移),一个简单的替代方法减小这种偏移(不是消除偏移),请见参考文献[7]。输入及输出参数配置说明。地址最多支持84个字节等于(UTF-8)28个字,在转为URL编码之前需对地址稍加验证和处理,以符合要求。

时间复杂度

若并行处理(时间必然缩短),请先行注释如下代码:
1
2
3
4
5
6
7
8
if BDcoord>300
    disp('This script had been ended.');
    return;
elseif BDcoord==-1
    invalidlist=[invalidlist; ii];
    disp([num2str(ii), '. not found.']);
    continue;
end
测试情况:1000个地址,顺序查询用时约在5.8分钟。

不同用户每日查询上限

权益对比:未认证用户(6000次/日)、认证个人用户(30万次/日)、认证企业用户(300万次/日)

References

Wednesday, August 15, 2018

Matlab: Batch add fields and assign values to features

Introduction

Given a vector file, add attribute values to each vector unit of the file for further analysis. Please see Fig. 1 for the attribute table before and after. In the original attribute table, some field names are displayed abnormally due to encoding issues, so please use English characters for relevant characters.(已知一个矢量文件,为该文件的每一矢量单元增加属性数值,用于进一步分析。属性表前后请见Fig. 1。其中,原属性表某字段名称前后因编码问题而显示异常,因而涉及字符请使用英文字符。)
Fig. 1

Note

代码配置情况如下,每次运行前注意核对:
1.确认DBF文件标签列,对应index_col_locate;
2.默认数值型属性长度20位,小数点后6位,可调整,空格填充0;
3.默认字符型属性长度14位,以应用英文字符为宜,长度可调整,空格填充*;
4.一次匹配一个矢量文件,输出在Output目录。

References

Monday, August 6, 2018

MAP API: BAIDU MAP API地址解析

我在互联网上找到一个小工具,可以作为各大在线地图地址解析的统一接口,操作很简单。所以我以工具作为地址查询坐标的直接应用,而非链接百度地图API。这个小工具的下载及操作请见文献[4]。
注意:个别名称指示范围并不准确,建议应在名称前加上特定区域名称纠正此错误,如上海XX公司北京分公司,解析过程极易导致解析范围地位在上海地区,此解决方式是直接在企业名称前强制加上北京等区域名称,有必要在每次解析之前将区域名称冠到企业名称之前
这里需要说明的是,小工具选择使用百度地图完成地址解析,返回的是百度坐标。百度坐标是在GCJ02坐标系(俗称火星坐标系)之上再加上一次非线性偏移而得,记为BD09。这里,我们就明白了,百度坐标是经过两次偏移得到,还不能直接应用到分析过程,它还需还原至原始坐标。
1
2
3
4
5
6
7
%一个取巧的方法将百度坐标转换为原始坐标[2]:
%假设你有百度坐标:x1=116.397428,y1=39.90923 
%把这个坐标当成GPS坐标,通过接口获得他的百度坐标:x2=116.41004950566,%y2=39.916979519873
%通过计算就可以得到GPS的坐标: 
x=2*x1-x2,y=2*y1-y2 
x=116.38480649434001 
y=39.901480480127

References

Thursday, July 26, 2018

Arcpy: Spatial Join

Introduction

Points落在哪些Polygon之上,如Fig. 1?SpatialJoin_analysis解决之。
Fig. 1
注意,Points和Polygon文件属性表尽量简洁,留下关注的属性,不必要的属性在处理前删除。如Fig. 2和Fig. 3。
Fig. 2
Fig. 3
运行结束后,我们就可以了解到这些Points具体归属情况,通过属性表,见Fig. 4。
Fig. 4

References

Arcpy: Merge

Merge Computes a geometric union of the input features. All features and their attributes will be written to the output feature class.
ArcToolbox--Data Management --General--Merge
注意文件使用绝对路径,否则可能引起不可预知的问题。Files

Reference

Sunday, July 22, 2018

Report: Tool Packages

[1] Legend.
[3] 海量数据(两个矩阵)依据单一字段关联属性。此时,万万不可逐条循环匹配,时间过久。参考思路如下:
1
2
3
4
% A & B have two or more fields, maybe they both are cell matrices, using the funtion 'intersect'
% n or m means some numerical, such as 2, 3, etc.
[C, ia, ib]=intersect(A(:, n), B(:, m)));
otmat=[A(ia, :), B(ib, 2)];
[7] INDEX List.

Friday, July 20, 2018

SQL Language

[2] dt='20180709',该字段是数据库更新时间,若不选该字段,结果将返回过去时间的数据,尤其注意多表关联操作时为每一张表指定更新时间.
[3] 若数据量超限,则应考虑分块查询并下载,如定义抓取时间WHERE bbd_dotime<=to_date('2016-12-31','yyyy-mm-dd').
1
2
3
4
str = '長和';
fod = fopen('test.txt','wt', 'n', 'GBK');
fprintf(fod,'%s',str);
fclose(fod);
[5] 查询语言返回某表的全部字段,限定100条,示例例子.
1
2
3
4
SELECT *
FROM table_name
WHERE dt='updateday'
LIMIT 100;
[6] 部分字段内容为字符型,需将特定编码转换为字符再加入查询语言,如企业类型编码.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
aa=[9300 1151 1151];

staa=[];
for ii=1:numel(aa)
    a=aa(ii);
    
    if ii<numel(aa)
        staa=[staa, '''', num2str(a), ''', '];
    else
        staa=[staa, '''', num2str(a), ''''];
    end
end

% Output
>> staa
staa =
'9300', '1151', '1151'
>> 
[7] 部分数值型字段潜在混有字符型数值(如空格情形),M代码规避这类异常,如下:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
try
    % 'value' is the column of numeric value
    value=cell2mat(value);
catch
    % check the class of every cell
    indclass=cellfun(@class, value, 'UniformOutput',false);
    % some cell may be the 'char' data type
    indchar=cellfun(@strcmp, indclass, repmat({'char'}, size(indclass)));
    ind=find(indchar==1);
    % delete non-numberic value
    value(ind)=[];
    % convert to numberic matrix
    value=cell2mat(value);
end
[8] 动态调整单行输出字符串数量,如下:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
str1=[];
str2=[];
for ii=1:numel(input)
    % Construct Execute Sentence
    if ii==numel(input)
        str1=[str1, '%s\n'];
        str2=[str2, 'strtrim(num2str(input{', num2str(ii), '}))'];
    else
        str1=[str1, '%s,'];
        str2=[str2, 'strtrim(num2str(input{', num2str(ii), '})),'];
    end
end

cmd='fprintf(fod, ';
cmd=[cmd, '''', str1, ''', ', str2, ');'];
eval(cmd);
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
or operate_scope like '%音乐%'
or operate_scope like '%明星包装%'
or operate_scope like '%歌手%'
or operate_scope like '%影视%'
or operate_scope like '%动画%'
or operate_scope like '%后期制作%'
or operate_scope like '%艺人%'
or operate_scope like '%影片%'
or operate_scope like '%电影%'
or operate_scope like '%专题%'
or operate_scope like '%专栏%'
or operate_scope like '%综艺%'
or operate_scope like '%广播剧%'
or operate_scope like '%电视剧%'
or operate_scope like '%企业形象策划%'
or operate_scope like '%摄制电影%'
or operate_scope like '%影视项目的投资管理%'
or operate_scope like '%图书%'
or operate_scope like '%期刊%'
[10] 示例代码.
1
2
3
4
5
6
7
8
9
% 注册类型排除“个体”
substring(company_companytype,1,2) !='93'
% 规定时间范围
(esdate between date('2017-01-01') and date('2017-03-31') )
% 规定数据版本
dt='20190326'
% 字段(字符型)相似查询,注意可能会收录混杂非需求数据
salary like '%null%'
salary not like '%null%'
1
2
3
4
5
6
7
8
% 按照年月有序输出
select year(pubdate) as year, month(pubdate) as month, count(main) as num
from dw.shgy_zhaobjg
where dt='20190701'
and (pubdate between date('2019-01-01') and date('2019-06-30') )
and title like '%上海%'
group by year(pubdate), month(pubdate)
order by year,month
1
2
3
4
5
6
7
8
% 按照年月省份等字段有序输出
SELECT year(sentence_date) as year ,month(sentence_date) as month , province, case_type, count(distinct bbd_xgxx_id) as num
FROM dw.legal_adjudicative_documents
WHERE dt='20190708'
and case_type like '%刑事%'
and (sentence_date between date('2016-01-01') and date('2019-07-08') )
group by year(sentence_date)  ,month(sentence_date), province, case_type
order by year, month, province, case_type
1
2
% 划定查询范围
company_companytype in ('5800','5600','5200'....)
1
2
% 模糊查询字段
regexp_like(operate_scope, '(锅炉|内燃机|轮机)')
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
--关联两表
select a.company_county, b.company_scale, a.address, a.operate_scope, a.company_industry, company_gis_lat, company_gis_lon
from qyxx_basic a join (select bbd_qyxx_id, company_scale from dw.enterprisescaleout 
where dt ='20190802'
)b
on a.bbd_qyxx_id=b.bbd_qyxx_id 
and a.dt='20191018'
and a.company_county like '11%'
and a.company_type not like '%个体工商户%'
and a.company_type not like '%合作社%'
and regexp_like(a.enterprise_status, '存续|开业|在册|在营')
and (esdate between date('1900-01-01') and date('2015-12-31') )
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
--两张查询表格按照键值合并,可能有子表数据不包含在最终结果
SELECT *
FROM (
    VALUES

        (2, 4, 20)
) AS table_1 (key_A, key_B, y1)
LEFT JOIN (
    VALUES
        (1, 3, 100),
        (2, 4, 200)
) AS table_2 (key_A, key_B, y2)
USING (key_A, key_B)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
--重要参考
select c2.year2 as year0, c2.month2 as month0, c2.industry2 as industry0, c2.province2 as province0, c2.count1_cpws2 as chengtounum, c1.count1_cpws1 as quanbunum
 from
 (select b.year as year1, b.month as month1, a.company_industry as industry1, a.company_province as province1, count(distinct b.bbd_xgxx_id) as count1_cpws1
  from 
  (select company_industry, company_province, bbd_qyxx_id 
   from dw.qyxx_basic 
    where dt='20191106'
     and not regexp_like(company_type, '个体|农民|合作社')
     and not regexp_like(enterprise_status, '注销|吊销')
   ) a
  join
  (select bbd_qyxx_id, bbd_xgxx_id, year(sentence_date) as year , month(sentence_date) as month
   from dw.legal_adjudicative_documents
    where dt='20191106'
     and (to_char(sentence_date, 'yyyy-mm-dd')>='2017-01-01' )
     and bbd_qyxx_id is not null 
   )b
  on a.bbd_qyxx_id=b.bbd_qyxx_id
  group by b.year, b.month, a.company_province, a.company_industry
 ) c1
 right join
 (select b2.year as year2, b2.month as month2, a2.company_industry as industry2, a2.company_province as province2, count(distinct b2.bbd_xgxx_id) as count1_cpws2
  from 
  (select company_industry, company_province, bbd_qyxx_id 
   from dw.qyxx_basic 
    where dt='20191106'
     and not regexp_like(company_type, '个体|农民|合作社')
     and not regexp_like(enterprise_status, '注销|吊销')
     and regexp_like(company_name, '建设投资|建设开发|投资开发|投资控股|投资发展|投资集团|国有资产运营|国有资本经营')
     and company_companytype in ('1110', '1140', '1213', '1223', '3100', '3200', '3300', '3400', '3500')
   ) a2
  join
  (select bbd_qyxx_id, bbd_xgxx_id, year(sentence_date) as year , month(sentence_date) as month
   from dw.legal_adjudicative_documents
    where dt='20191106'
     and (to_char(sentence_date, 'yyyy-mm-dd')>='2017-01-01' )
     and bbd_qyxx_id is not null 
   )b2
  on a2.bbd_qyxx_id=b2.bbd_qyxx_id
  group by b2.year, b2.month, a2.company_province, a2.company_industry
 ) c2
 on c1.year1=c2.year2 and c1.month1=c2.month2 and c1.industry1=c2.industry2 and c1.province1=c2.province2
order by c2.year2,c2.month2,c2.industry2,c2.province2
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
--重点体会CASE用法,注意前SELECT后GROUP BY
SELECT (case when a.company_county like '1101%' then '北京市'
      when a.company_county like '1201%' then '天津市'
      when a.company_county like '1301%' then '石家庄市'
      when a.company_county like '1501%' then '呼和浩特市'
      when a.company_county like '1401%' then '太原市'
      when a.company_county like '2101%' then '沈阳市'
      when a.company_county like '2102%' then '大连市'
      when a.company_county like '2201%' then '长春市'
      when a.company_county like '2301%' then '哈尔滨市'
      when a.company_county like '3101%' then '上海市'
      when a.company_county like '3201%' then '南京市'
      when a.company_county like '3301%' then '杭州市'
      when a.company_county like '3302%' then '宁波市'
      when a.company_county like '3401%' then '合肥市'
      when a.company_county like '3501%' then '福州市'
      when a.company_county like '3502%' then '厦门市'
      when a.company_county like '3601%' then '南昌市'
      when a.company_county like '3701%' then '济南市'
      when a.company_county like '3702%' then '青岛市'
      when a.company_county like '4101%' then '郑州市'
      when a.company_county like '4201%' then '武汉市'
      when a.company_county like '4301%' then '长沙市'
      when a.company_county like '4401%' then '广州市'
      when a.company_county like '4403%' then '深圳市'
      when a.company_county like '4501%' then '南宁市'
      when a.company_county like '4601%' then '海口市'
      when a.company_county like '500%' then '重庆市'
      when a.company_county like '5101%' then '成都市'
      when a.company_county like '5201%' then '贵阳市'
      when a.company_county like '5301%' then '昆明市'
      when a.company_county like '6101%' then '西安市'
      when a.company_county like '6201%' then '兰州市'
      when a.company_county like '6301%' then '西宁市'
      when a.company_county like '6401%' then '银川市'
      when a.company_county like '6501%' then '乌鲁木齐市'
      when a.company_county like '5401%' then '拉萨市'
      else '其他' end) as city, count(b.company_name)
from dw.qyxx_basic a right join dw.recruit b
on a.bbd_qyxx_id=b.bbd_qyxx_id 
where b.pubdate >=to_date('2017-01-01','yyyy-mm-dd') and b.pubdate <=to_date('2017-03-31','yyyy-mm-dd')
and b.dt='20180723'
and a.dt='20180723'
group by (case when a.company_county like '1101%' then '北京市'
      when a.company_county like '1201%' then '天津市'
      when a.company_county like '1301%' then '石家庄市'
      when a.company_county like '1501%' then '呼和浩特市'
      when a.company_county like '1401%' then '太原市'
      when a.company_county like '2101%' then '沈阳市'
      when a.company_county like '2102%' then '大连市'
      when a.company_county like '2201%' then '长春市'
      when a.company_county like '2301%' then '哈尔滨市'
      when a.company_county like '3101%' then '上海市'
      when a.company_county like '3201%' then '南京市'
      when a.company_county like '3301%' then '杭州市'
      when a.company_county like '3302%' then '宁波市'
      when a.company_county like '3401%' then '合肥市'
      when a.company_county like '3501%' then '福州市'
      when a.company_county like '3502%' then '厦门市'
      when a.company_county like '3601%' then '南昌市'
      when a.company_county like '3701%' then '济南市'
      when a.company_county like '3702%' then '青岛市'
      when a.company_county like '4101%' then '郑州市'
      when a.company_county like '4201%' then '武汉市'
      when a.company_county like '4301%' then '长沙市'
      when a.company_county like '4401%' then '广州市'
      when a.company_county like '4403%' then '深圳市'
      when a.company_county like '4501%' then '南宁市'
      when a.company_county like '4601%' then '海口市'
      when a.company_county like '500%' then '重庆市'
      when a.company_county like '5101%' then '成都市'
      when a.company_county like '5201%' then '贵阳市'
      when a.company_county like '5301%' then '昆明市'
      when a.company_county like '6101%' then '西安市'
      when a.company_county like '6201%' then '兰州市'
      when a.company_county like '6301%' then '西宁市'
      when a.company_county like '6401%' then '银川市'
      when a.company_county like '6501%' then '乌鲁木齐市'
      when a.company_county like '5401%' then '拉萨市'
      else '其他' end)
1
2
3
4
5
% 字符规范化
dat=cellfun(@(x) char(x), dat, 'UniformOutput', false);
dat=cellfun(@(x) strtrim(x), dat, 'UniformOutput', false);
dat(:, 1)=cellfun(@(x) [x, '*****'], dat(:, 1), 'UniformOutput', false);
dat(:, 1)=cellfun(@(x) x(1:3), dat(:, 1), 'UniformOutput', false);
1
2
3
4
5
6
% 解析最底层Cell
newinp=tempinp;
    while any(cellfun(@iscell, newinp))
        newinp= [newinp{cellfun(@iscell, newinp)} newinp(~cellfun(@iscell, newinp))];
    end
newinp=newinp';
1
2
--显示dt列表
show partitions from qyxx_wanfang_zhuanli
1
2
3
4
5
6
7
8
9
--存续企业示例
SELECT count(bbd_qyxx_id)
FROM dw.qyxx_basic
WHERE dt='20200720'
and not regexp_like(company_type , '个体|农民专业合作|合作社')
and operate_scope like '%水泥%'
and not regexp_like(operate_scope, '石膏|预制构件|轻质建筑材料|石棉|玻璃纤维|木材|玻璃钢')
and (esdate between date('1900-01-01') and date('2014-12-31') )
and ((company_enterprise_status like '存续') or (cancel_date between date('2014-01-01') and date('2014-12-31')) )
1
2
3
4
5
6
7
8
--存续企业示例之二
select count(distinct bbd_qyxx_id) as company_num
from (select esdate,cancel_date,revoke_date,bbd_qyxx_id
from dw.qyxx_basic
where dt = '20210505' 
and esdate<= date('2020-12-31')
and (cancel_date> date('2020-12-31') or cancel_date is null)
and (revoke_date> date('2020-12-31') or revoke_date is null)
1
2
3
4
5
6
7
--企业筛选条件
--1、剔除个体户和农专社
and company_companytype not in ('9100','9200','9300','9310','9320')
--2、剔除个体工商户、外资企业、港澳台企业等
and company_companytype not in ('9100','9200','9300','9310','9320','5000','5500','5100','5110','5120','5130','5140','5150','5160','5190','5200','5210','5220','5230','5240','5290','5300','5310','5390','5400','5410','5420','5430','5490','5800','5810','5820','5830','5840','5890','6000','6100','6110','6120','6130','6140','6150','6160','6170','6190','6200','6210','6220','6230','6240','6250','6260','6270','6290','6300','6310','6320','6390','6400','6410','6420','6430','6490','6500','6510','6600','6610','6700','6710','6800','6810','6820','6830','6840','6890','7000','7100','7110','7120','7130','7190','7200','7300','7310','7390','9925','9926','9927','9929','9931','9936','9937','9941','1120','1121','1122','1123','1211','1221','2120','2121','2122','2123','2211','2221')
--3、保留存续企业
and company_enterprise_status in ('正常','存续','开业','个体转企业','待迁入','迁入')