数据分析：情报威胁数据分析(2) 郝伟 2021/05/25 [TOC]

1. 文件类型

首先，补充一个昨天的文件分类，简单对代码发动一下就可以得到一下内容：

单文件单个json的文件列表
- url_c2-20190814day-actively.json
单文件每行1个json对象的文件列表
- domain_c2-20190814day-actively.json
- domain_reputation-20190814day-actively.json
- email_reputation-20190814day-actively.json
- email_spamming-20190814day-actively.json
- hash_reputation-20190814day-actively.json
- ip_c2-20190814day-actively.json
单文件多行1个json多个json对象的文件列表
- ip_proxy-20190814day-actively.json
- ip_proxy-20190814day-actively1.josn
- ip_reputation-20190814day-actively.json
- ip_spamming-20190814day-actively.json
- ip_tor-20190814day-actively.json
- url_c2-20190814day-actively.json
- url_phishing-20190814day-actively.json
- url_reputation-20190814day-actively.json

2. 数据汇总

对昨天的输出结果进行简单的筛选，只留下路径和数据，保存为CSV文件 (下载) 后，部分内容如下：

Path,Count
./attackerInfo,38
./attackerInfo,265
./attackerInfo,45
./attackerInfo,1
./attackerInfo,1464
./attackerInfo,1464
./attackerInfo,9162
./attackerInfo,495
./attackerInfo,307
./attackerInfo,1
./attackerInfo,795
./attackerInfo,4036
./basicInfo,38
./basicInfo,265
./basicInfo,618
./basicInfo,452
...(中间省略若干行)...
./threatIntelligence/[*]/type,4
./whois,38
./whois,265
./whois,45
./whois,1
./whois,1464
./whois,1464
./whois,9162
./whois,495
./whois,307
./whois,1
./whois,795
./whois,4036

再使用以下代码分析。

import os
import pandas as pd
datafile = os.path.join(os.path.dirname(__file__), 'data1.csv')

# data = csv.load_csv(datafile)

df = pd.read_csv(datafile)

id = 0
print('{0}\t{1}\t{2}'.format('ID', 'Path'.ljust(50, ' '), 'Count'))
for item in df.groupby('Path'):
    p=item[0]
    df1=item[1]
    id = id + 1
    print('{0:0>2d}\t{1}\t{2}'.format(id, p.ljust(50, ' '), df1['Count'].sum()))

得到如下结果：

01 ./attackerInfo 18073
02 ./basicInfo 23759
03 ./basicInfo/attackAction 23759
04 ./basicInfo/attackInProtocol 19143
05 ./basicInfo/data 23759
06 ./basicInfo/dataType 23759
07 ./basicInfo/firstTime 23759
08 ./basicInfo/lastTime 23759
09 ./basicInfo/location 12937
10 ./basicInfo/location/cityName 12937
11 ./basicInfo/location/countryCode 12937
12 ./basicInfo/location/countryName 12937
13 ./basicInfo/location/latitude 12937
14 ./basicInfo/location/longitude 12937
15 ./basicInfo/location/provinceName 12937
16 ./basicInfo/malwareClass 23759
17 ./basicInfo/origin 4616
18 ./basicInfo/origin/md5 4616
19 ./basicInfo/origin/sha1 4616
20 ./basicInfo/origin/sha256 4616
21 ./basicInfo/tags 23759
22 ./basicInfo/total 23759
23 ./linkedAnalysis 23759
24 ./threatIntelligence 23759
25 ./threatIntelligence/[*]/DIRPort 372
26 ./threatIntelligence/[*]/ORPort 372
27 ./threatIntelligence/[*]/activeTime 23759
28 ./threatIntelligence/[*]/anonymity 3312
29 ./threatIntelligence/[*]/channel 23759
30 ./threatIntelligence/[*]/class 5
31 ./threatIntelligence/[*]/description 1
32 ./threatIntelligence/[*]/domain 15
33 ./threatIntelligence/[*]/email 743
34 ./threatIntelligence/[*]/exit 372
35 ./threatIntelligence/[*]/ip 169
36 ./threatIntelligence/[*]/level 23759
37 ./threatIntelligence/[*]/port 3313
38 ./threatIntelligence/[*]/reportDesc 2
39 ./threatIntelligence/[*]/reportName 2
40 ./threatIntelligence/[*]/saveTime 24
41 ./threatIntelligence/[*]/server 372
42 ./threatIntelligence/[*]/source 24
43 ./threatIntelligence/[*]/tags 18
44 ./threatIntelligence/[*]/target 1584
45 ./threatIntelligence/[*]/type 3312
46 ./whois 18073

由此可见，所有数据一共有46个路径。

3. 结构建立

通过以上分析，可以得到以下目录结构：

attackerInfo
basicInfo
- attackAction
- attackInProtocol
- data
- dataType
- firstTime
- lastTime
- location
  - cityName
  - countryCode
  - countryName
  - latitude
  - longitude
  - provinceName
- malwareClass
- origin
  - md5
  - sha1
  - sha256
- tags
- total
linkedAnalysis
threatIntelligence
- [*]/DIRPort
- [*]/ORPort
- [*]/activeTime
- [*]/anonymity
- [*]/channel
- [*]/class
- [*]/description
- [*]/domain
- [*]/email
- [*]/exit
- [*]/ip
- [*]/level
- [*]/port
- [*]/reportDesc
- [*]/reportName
- [*]/saveTime
- [*]/server
- [*]/source
- [*]/tags
- [*]/target
- [*]/type
whois

根据以上的结构，几乎所有数据都有的五块核心内容：

basicInfo, 基本的情报信息。
attackerInfo, 表示攻击者的信息，目前数据都是空值。
linkedAnalysis, 相关分析内容，目前数据都是空值。
threatIntelligence, 威胁情报信息
whois, 所有者信息，目前数据都是空值。

由于有三块内容都是空值，所以重点关注 basicInfo 和 threadIntelligence 两块数据。

4. 进一步分析数据实体

以下附了随机选择的四段数据

{
    "attackerInfo": [],
    "whois": [],
    "basicInfo": {
        "lastTime": "2019-08-14 08:32:02",
        "firstTime": "2019-08-14 08:32:00",
        "total": 1,
        "data": "http://005verf-desjcontrole01.com/index91484101498.php?e7ba90d65cc732dd74224a64fd71ac6b",
        "dataType": "url",
        "attackAction": [],
        "attackInProtocol": [],
        "malwareClass": [],
        "tags": "+"
    },
    "threatIntelligence": [
        {
            "level": 75,
            "activeTime": "2019-08-14 08:32:02",
            "channel": "url_phishing",
            "target": ""
        },
        {
            "level": 75,
            "activeTime": "2019-08-14 08:32:02",
            "channel": "url_reputation"
        }
    ],
    "linkedAnalysis": []
}

{
    "attackerInfo": [],
    "whois": [],
    "basicInfo": {
        "lastTime": "2019-08-14 08:32:02",
        "firstTime": "2019-08-14 08:32:00",
        "total": 1,
        "data": "http://125.90.52.61/login/",
        "dataType": "url",
        "attackAction": [],
        "attackInProtocol": [],
        "malwareClass": [],
        "tags": "+"
    },
    "threatIntelligence": [
        {
            "level": 75,
            "activeTime": "2019-08-14 08:32:02",
            "channel": "url_phishing",
            "target": ""
        },
        {
            "level": 75,
            "activeTime": "2019-08-14 08:32:02",
            "channel": "url_reputation"
        }
    ],
    "linkedAnalysis": []
}

{
    "attackerInfo": [],
    "whois": [],
    "basicInfo": {
        "lastTime": "2019-08-14 08:32:02",
        "firstTime": "2019-08-14 08:32:00",
        "total": 1,
        "data": "http://140.82.27.13/bfa/verification/3n413n9n4c38444a8ba0/card.php?cmd=_account-details&session=6d90353f642f7ceb1d346d11f005a0af&dispatch=f5a49cb9d14f2dd8008f29e1ed862079afbd8daf",
        "dataType": "url",
        "attackAction": [],
        "attackInProtocol": [],
        "malwareClass": [],
        "tags": "+"
    },
    "threatIntelligence": [
        {
            "level": 75,
            "activeTime": "2019-08-14 08:32:02",
            "channel": "url_phishing",
            "target": ""
        },
        {
            "level": 75,
            "activeTime": "2019-08-14 08:32:02",
            "channel": "url_reputation"
        }
    ],
    "linkedAnalysis": []
}

{
    "attackerInfo": [],
    "whois": [],
    "basicInfo": {
        "lastTime": "2019-08-14 08:32:02",
        "firstTime": "2019-08-14 08:32:00",
        "total": 1,
        "data": "http://140.82.27.13/bfa/verification/3n413n9n4c38444a8ba0/id.php?cmd=_account-details&session=08dbcf625a66abdc642862e11c569b00&dispatch=5788b9d560d6cd6b5b5c96c633fd13de342ba35a",
        "dataType": "url",
        "attackAction": [],
        "attackInProtocol": [],
        "malwareClass": [],
        "tags": "+"
    },
    "threatIntelligence": [
        {
            "level": 75,
            "activeTime": "2019-08-14 08:32:02",
            "channel": "url_phishing",
            "target": ""
        },
        {
            "level": 75,
            "activeTime": "2019-08-14 08:32:02",
            "channel": "url_reputation"
        }
    ],
    "linkedAnalysis": []
}

数据分析：情报威胁数据分析(2) 2021/05/25

1. 文件类型

2. 数据汇总

3. 结构建立

4. 进一步分析数据实体

results matching ""

No results matching ""