数据分析:情报威胁数据分析(2) 郝伟 2021/05/25 [TOC]

1. 文件类型

首先,补充一个昨天的文件分类,简单对代码发动一下就可以得到一下内容:

  • 单文件单个json的文件列表
    • url_c2-20190814day-actively.json
  • 单文件每行1个json对象的文件列表
    • domain_c2-20190814day-actively.json
    • domain_reputation-20190814day-actively.json
    • email_reputation-20190814day-actively.json
    • email_spamming-20190814day-actively.json
    • hash_reputation-20190814day-actively.json
    • ip_c2-20190814day-actively.json
  • 单文件多行1个json多个json对象的文件列表
    • ip_proxy-20190814day-actively.json
    • ip_proxy-20190814day-actively1.josn
    • ip_reputation-20190814day-actively.json
    • ip_spamming-20190814day-actively.json
    • ip_tor-20190814day-actively.json
    • url_c2-20190814day-actively.json
    • url_phishing-20190814day-actively.json
    • url_reputation-20190814day-actively.json

2. 数据汇总

对昨天的输出结果进行简单的筛选,只留下路径和数据,保存为CSV文件 (下载) 后,部分内容如下:

Path,Count
./attackerInfo,38
./attackerInfo,265
./attackerInfo,45
./attackerInfo,1
./attackerInfo,1464
./attackerInfo,1464
./attackerInfo,9162
./attackerInfo,495
./attackerInfo,307
./attackerInfo,1
./attackerInfo,795
./attackerInfo,4036
./basicInfo,38
./basicInfo,265
./basicInfo,618
./basicInfo,452
...(中间省略若干行)...
./threatIntelligence/[*]/type,4
./whois,38
./whois,265
./whois,45
./whois,1
./whois,1464
./whois,1464
./whois,9162
./whois,495
./whois,307
./whois,1
./whois,795
./whois,4036

再使用以下代码分析。

import os
import pandas as pd
datafile = os.path.join(os.path.dirname(__file__), 'data1.csv')

# data = csv.load_csv(datafile)

df = pd.read_csv(datafile)

id = 0
print('{0}\t{1}\t{2}'.format('ID', 'Path'.ljust(50, ' '), 'Count'))
for item in df.groupby('Path'):
    p=item[0]
    df1=item[1]
    id = id + 1
    print('{0:0>2d}\t{1}\t{2}'.format(id, p.ljust(50, ' '), df1['Count'].sum()))

得到如下结果:

01 ./attackerInfo 18073
02 ./basicInfo 23759
03 ./basicInfo/attackAction 23759
04 ./basicInfo/attackInProtocol 19143
05 ./basicInfo/data 23759
06 ./basicInfo/dataType 23759
07 ./basicInfo/firstTime 23759
08 ./basicInfo/lastTime 23759
09 ./basicInfo/location 12937
10 ./basicInfo/location/cityName 12937
11 ./basicInfo/location/countryCode 12937
12 ./basicInfo/location/countryName 12937
13 ./basicInfo/location/latitude 12937
14 ./basicInfo/location/longitude 12937
15 ./basicInfo/location/provinceName 12937
16 ./basicInfo/malwareClass 23759
17 ./basicInfo/origin 4616
18 ./basicInfo/origin/md5 4616
19 ./basicInfo/origin/sha1 4616
20 ./basicInfo/origin/sha256 4616
21 ./basicInfo/tags 23759
22 ./basicInfo/total 23759
23 ./linkedAnalysis 23759
24 ./threatIntelligence 23759
25 ./threatIntelligence/[*]/DIRPort 372
26 ./threatIntelligence/[*]/ORPort 372
27 ./threatIntelligence/[*]/activeTime 23759
28 ./threatIntelligence/[*]/anonymity 3312
29 ./threatIntelligence/[*]/channel 23759
30 ./threatIntelligence/[*]/class 5
31 ./threatIntelligence/[*]/description 1
32 ./threatIntelligence/[*]/domain 15
33 ./threatIntelligence/[*]/email 743
34 ./threatIntelligence/[*]/exit 372
35 ./threatIntelligence/[*]/ip 169
36 ./threatIntelligence/[*]/level 23759
37 ./threatIntelligence/[*]/port 3313
38 ./threatIntelligence/[*]/reportDesc 2
39 ./threatIntelligence/[*]/reportName 2
40 ./threatIntelligence/[*]/saveTime 24
41 ./threatIntelligence/[*]/server 372
42 ./threatIntelligence/[*]/source 24
43 ./threatIntelligence/[*]/tags 18
44 ./threatIntelligence/[*]/target 1584
45 ./threatIntelligence/[*]/type 3312
46 ./whois 18073

由此可见,所有数据一共有46个路径。

3. 结构建立

通过以上分析,可以得到以下目录结构:

  • attackerInfo
  • basicInfo
    • attackAction
    • attackInProtocol
    • data
    • dataType
    • firstTime
    • lastTime
    • location
      • cityName
      • countryCode
      • countryName
      • latitude
      • longitude
      • provinceName
    • malwareClass
    • origin
      • md5
      • sha1
      • sha256
    • tags
    • total
  • linkedAnalysis
  • threatIntelligence
    • [*]/DIRPort
    • [*]/ORPort
    • [*]/activeTime
    • [*]/anonymity
    • [*]/channel
    • [*]/class
    • [*]/description
    • [*]/domain
    • [*]/email
    • [*]/exit
    • [*]/ip
    • [*]/level
    • [*]/port
    • [*]/reportDesc
    • [*]/reportName
    • [*]/saveTime
    • [*]/server
    • [*]/source
    • [*]/tags
    • [*]/target
    • [*]/type
  • whois

根据以上的结构,几乎所有数据都有的五块核心内容:

  • basicInfo, 基本的情报信息。
  • attackerInfo, 表示攻击者的信息,目前数据都是空值
  • linkedAnalysis, 相关分析内容,目前数据都是空值
  • threatIntelligence, 威胁情报信息
  • whois, 所有者信息,目前数据都是空值

由于有三块内容都是空值,所以重点关注 basicInfothreadIntelligence 两块数据。

4. 进一步分析数据实体

以下附了随机选择的四段数据

{
    "attackerInfo": [],
    "whois": [],
    "basicInfo": {
        "lastTime": "2019-08-14 08:32:02",
        "firstTime": "2019-08-14 08:32:00",
        "total": 1,
        "data": "http://005verf-desjcontrole01.com/index91484101498.php?e7ba90d65cc732dd74224a64fd71ac6b",
        "dataType": "url",
        "attackAction": [],
        "attackInProtocol": [],
        "malwareClass": [],
        "tags": "+"
    },
    "threatIntelligence": [
        {
            "level": 75,
            "activeTime": "2019-08-14 08:32:02",
            "channel": "url_phishing",
            "target": ""
        },
        {
            "level": 75,
            "activeTime": "2019-08-14 08:32:02",
            "channel": "url_reputation"
        }
    ],
    "linkedAnalysis": []
}

{
    "attackerInfo": [],
    "whois": [],
    "basicInfo": {
        "lastTime": "2019-08-14 08:32:02",
        "firstTime": "2019-08-14 08:32:00",
        "total": 1,
        "data": "http://125.90.52.61/login/",
        "dataType": "url",
        "attackAction": [],
        "attackInProtocol": [],
        "malwareClass": [],
        "tags": "+"
    },
    "threatIntelligence": [
        {
            "level": 75,
            "activeTime": "2019-08-14 08:32:02",
            "channel": "url_phishing",
            "target": ""
        },
        {
            "level": 75,
            "activeTime": "2019-08-14 08:32:02",
            "channel": "url_reputation"
        }
    ],
    "linkedAnalysis": []
}

{
    "attackerInfo": [],
    "whois": [],
    "basicInfo": {
        "lastTime": "2019-08-14 08:32:02",
        "firstTime": "2019-08-14 08:32:00",
        "total": 1,
        "data": "http://140.82.27.13/bfa/verification/3n413n9n4c38444a8ba0/card.php?cmd=_account-details&session=6d90353f642f7ceb1d346d11f005a0af&dispatch=f5a49cb9d14f2dd8008f29e1ed862079afbd8daf",
        "dataType": "url",
        "attackAction": [],
        "attackInProtocol": [],
        "malwareClass": [],
        "tags": "+"
    },
    "threatIntelligence": [
        {
            "level": 75,
            "activeTime": "2019-08-14 08:32:02",
            "channel": "url_phishing",
            "target": ""
        },
        {
            "level": 75,
            "activeTime": "2019-08-14 08:32:02",
            "channel": "url_reputation"
        }
    ],
    "linkedAnalysis": []
}

{
    "attackerInfo": [],
    "whois": [],
    "basicInfo": {
        "lastTime": "2019-08-14 08:32:02",
        "firstTime": "2019-08-14 08:32:00",
        "total": 1,
        "data": "http://140.82.27.13/bfa/verification/3n413n9n4c38444a8ba0/id.php?cmd=_account-details&session=08dbcf625a66abdc642862e11c569b00&dispatch=5788b9d560d6cd6b5b5c96c633fd13de342ba35a",
        "dataType": "url",
        "attackAction": [],
        "attackInProtocol": [],
        "malwareClass": [],
        "tags": "+"
    },
    "threatIntelligence": [
        {
            "level": 75,
            "activeTime": "2019-08-14 08:32:02",
            "channel": "url_phishing",
            "target": ""
        },
        {
            "level": 75,
            "activeTime": "2019-08-14 08:32:02",
            "channel": "url_reputation"
        }
    ],
    "linkedAnalysis": []
}

results matching ""

    No results matching ""