数据分析:情报威胁数据分析(2) 郝伟 2021/05/25 [TOC]
1. 文件类型
首先,补充一个昨天的文件分类,简单对代码发动一下就可以得到一下内容:
- 单文件单个json的文件列表
- url_c2-20190814day-actively.json
- 单文件每行1个json对象的文件列表
- domain_c2-20190814day-actively.json
- domain_reputation-20190814day-actively.json
- email_reputation-20190814day-actively.json
- email_spamming-20190814day-actively.json
- hash_reputation-20190814day-actively.json
- ip_c2-20190814day-actively.json
- 单文件多行1个json多个json对象的文件列表
- ip_proxy-20190814day-actively.json
- ip_proxy-20190814day-actively1.josn
- ip_reputation-20190814day-actively.json
- ip_spamming-20190814day-actively.json
- ip_tor-20190814day-actively.json
- url_c2-20190814day-actively.json
- url_phishing-20190814day-actively.json
- url_reputation-20190814day-actively.json
2. 数据汇总
对昨天的输出结果进行简单的筛选,只留下路径和数据,保存为CSV文件 (下载) 后,部分内容如下:
Path,Count
./attackerInfo,38
./attackerInfo,265
./attackerInfo,45
./attackerInfo,1
./attackerInfo,1464
./attackerInfo,1464
./attackerInfo,9162
./attackerInfo,495
./attackerInfo,307
./attackerInfo,1
./attackerInfo,795
./attackerInfo,4036
./basicInfo,38
./basicInfo,265
./basicInfo,618
./basicInfo,452
...(中间省略若干行)...
./threatIntelligence/[*]/type,4
./whois,38
./whois,265
./whois,45
./whois,1
./whois,1464
./whois,1464
./whois,9162
./whois,495
./whois,307
./whois,1
./whois,795
./whois,4036
再使用以下代码分析。
import os
import pandas as pd
datafile = os.path.join(os.path.dirname(__file__), 'data1.csv')
# data = csv.load_csv(datafile)
df = pd.read_csv(datafile)
id = 0
print('{0}\t{1}\t{2}'.format('ID', 'Path'.ljust(50, ' '), 'Count'))
for item in df.groupby('Path'):
p=item[0]
df1=item[1]
id = id + 1
print('{0:0>2d}\t{1}\t{2}'.format(id, p.ljust(50, ' '), df1['Count'].sum()))
得到如下结果:
01 ./attackerInfo 18073
02 ./basicInfo 23759
03 ./basicInfo/attackAction 23759
04 ./basicInfo/attackInProtocol 19143
05 ./basicInfo/data 23759
06 ./basicInfo/dataType 23759
07 ./basicInfo/firstTime 23759
08 ./basicInfo/lastTime 23759
09 ./basicInfo/location 12937
10 ./basicInfo/location/cityName 12937
11 ./basicInfo/location/countryCode 12937
12 ./basicInfo/location/countryName 12937
13 ./basicInfo/location/latitude 12937
14 ./basicInfo/location/longitude 12937
15 ./basicInfo/location/provinceName 12937
16 ./basicInfo/malwareClass 23759
17 ./basicInfo/origin 4616
18 ./basicInfo/origin/md5 4616
19 ./basicInfo/origin/sha1 4616
20 ./basicInfo/origin/sha256 4616
21 ./basicInfo/tags 23759
22 ./basicInfo/total 23759
23 ./linkedAnalysis 23759
24 ./threatIntelligence 23759
25 ./threatIntelligence/[*]/DIRPort 372
26 ./threatIntelligence/[*]/ORPort 372
27 ./threatIntelligence/[*]/activeTime 23759
28 ./threatIntelligence/[*]/anonymity 3312
29 ./threatIntelligence/[*]/channel 23759
30 ./threatIntelligence/[*]/class 5
31 ./threatIntelligence/[*]/description 1
32 ./threatIntelligence/[*]/domain 15
33 ./threatIntelligence/[*]/email 743
34 ./threatIntelligence/[*]/exit 372
35 ./threatIntelligence/[*]/ip 169
36 ./threatIntelligence/[*]/level 23759
37 ./threatIntelligence/[*]/port 3313
38 ./threatIntelligence/[*]/reportDesc 2
39 ./threatIntelligence/[*]/reportName 2
40 ./threatIntelligence/[*]/saveTime 24
41 ./threatIntelligence/[*]/server 372
42 ./threatIntelligence/[*]/source 24
43 ./threatIntelligence/[*]/tags 18
44 ./threatIntelligence/[*]/target 1584
45 ./threatIntelligence/[*]/type 3312
46 ./whois 18073
由此可见,所有数据一共有46个路径。
3. 结构建立
通过以上分析,可以得到以下目录结构:
- attackerInfo
- basicInfo
- attackAction
- attackInProtocol
- data
- dataType
- firstTime
- lastTime
- location
- cityName
- countryCode
- countryName
- latitude
- longitude
- provinceName
- malwareClass
- origin
- md5
- sha1
- sha256
- tags
- total
- linkedAnalysis
- threatIntelligence
- [*]/DIRPort
- [*]/ORPort
- [*]/activeTime
- [*]/anonymity
- [*]/channel
- [*]/class
- [*]/description
- [*]/domain
- [*]/exit
- [*]/ip
- [*]/level
- [*]/port
- [*]/reportDesc
- [*]/reportName
- [*]/saveTime
- [*]/server
- [*]/source
- [*]/tags
- [*]/target
- [*]/type
- whois
根据以上的结构,几乎所有数据都有的五块核心内容:
- basicInfo, 基本的情报信息。
- attackerInfo, 表示攻击者的信息,目前数据都是空值。
- linkedAnalysis, 相关分析内容,目前数据都是空值。
- threatIntelligence, 威胁情报信息
- whois, 所有者信息,目前数据都是空值。
由于有三块内容都是空值,所以重点关注 basicInfo 和 threadIntelligence 两块数据。
4. 进一步分析数据实体
以下附了随机选择的四段数据
{
"attackerInfo": [],
"whois": [],
"basicInfo": {
"lastTime": "2019-08-14 08:32:02",
"firstTime": "2019-08-14 08:32:00",
"total": 1,
"data": "http://005verf-desjcontrole01.com/index91484101498.php?e7ba90d65cc732dd74224a64fd71ac6b",
"dataType": "url",
"attackAction": [],
"attackInProtocol": [],
"malwareClass": [],
"tags": "+"
},
"threatIntelligence": [
{
"level": 75,
"activeTime": "2019-08-14 08:32:02",
"channel": "url_phishing",
"target": ""
},
{
"level": 75,
"activeTime": "2019-08-14 08:32:02",
"channel": "url_reputation"
}
],
"linkedAnalysis": []
}
{
"attackerInfo": [],
"whois": [],
"basicInfo": {
"lastTime": "2019-08-14 08:32:02",
"firstTime": "2019-08-14 08:32:00",
"total": 1,
"data": "http://125.90.52.61/login/",
"dataType": "url",
"attackAction": [],
"attackInProtocol": [],
"malwareClass": [],
"tags": "+"
},
"threatIntelligence": [
{
"level": 75,
"activeTime": "2019-08-14 08:32:02",
"channel": "url_phishing",
"target": ""
},
{
"level": 75,
"activeTime": "2019-08-14 08:32:02",
"channel": "url_reputation"
}
],
"linkedAnalysis": []
}
{
"attackerInfo": [],
"whois": [],
"basicInfo": {
"lastTime": "2019-08-14 08:32:02",
"firstTime": "2019-08-14 08:32:00",
"total": 1,
"data": "http://140.82.27.13/bfa/verification/3n413n9n4c38444a8ba0/card.php?cmd=_account-details&session=6d90353f642f7ceb1d346d11f005a0af&dispatch=f5a49cb9d14f2dd8008f29e1ed862079afbd8daf",
"dataType": "url",
"attackAction": [],
"attackInProtocol": [],
"malwareClass": [],
"tags": "+"
},
"threatIntelligence": [
{
"level": 75,
"activeTime": "2019-08-14 08:32:02",
"channel": "url_phishing",
"target": ""
},
{
"level": 75,
"activeTime": "2019-08-14 08:32:02",
"channel": "url_reputation"
}
],
"linkedAnalysis": []
}
{
"attackerInfo": [],
"whois": [],
"basicInfo": {
"lastTime": "2019-08-14 08:32:02",
"firstTime": "2019-08-14 08:32:00",
"total": 1,
"data": "http://140.82.27.13/bfa/verification/3n413n9n4c38444a8ba0/id.php?cmd=_account-details&session=08dbcf625a66abdc642862e11c569b00&dispatch=5788b9d560d6cd6b5b5c96c633fd13de342ba35a",
"dataType": "url",
"attackAction": [],
"attackInProtocol": [],
"malwareClass": [],
"tags": "+"
},
"threatIntelligence": [
{
"level": 75,
"activeTime": "2019-08-14 08:32:02",
"channel": "url_phishing",
"target": ""
},
{
"level": 75,
"activeTime": "2019-08-14 08:32:02",
"channel": "url_reputation"
}
],
"linkedAnalysis": []
}