对json文件进行动态遍历以生成所有键的完整路径 郝伟 2021/04/29 [TOC]

1. 简介

在对Json文件的进行分析时,我们经常需要分析其键的完整唯一路径,比如说有以下这个json数据:

{
    "id": "vulnerability--103f862a-8d0d-11eb-a463-6fc32abf5537",
    "name": "CVE-2020-0001",
    "created": "2020-01-08T19:15:00.000Z",
    "modified": "2020-01-14T21:52:00.000Z",
    "description": "In getProcessRecordLocked of ActivityManagerService.java isolated apps are not handled correctly. This could lead to local escalation of privilege with no additional execution privileges needed. User interaction is not needed for exploitation. Product: Android Versions: Android-8.0, Android-8.1, Android-9, and Android-10 Android ID: A-140055304",
    "created_by_ref": "identity--1531171c-cf59-435a-9e2f-f57721a5da4b",
    "external_references": [
        {
            "source_name": "NIST NVD",
            "url": "https://nvd.nist.gov/vuln/detail/CVE-2020-0001"
        },
        {
            "source_name": "CONFIRM",
            "url": "https://source.android.com/security/bulletin/2020-01-01"
        }
    ],
    "x_opencti_base_score": 7.8,
    "x_opencti_base_severity": "HIGH",
    "x_opencti_attack_vector": "LOCAL",
    "x_opencti_integrity_impact": "HIGH",
    "x_opencti_availability_impact": "HIGH",
    "type": "vulnerability",
    "spec_version": "2.1"
}

那么我们在分析以后,希望得到其所有键的唯一路径,内容如下所示,其中列表中的内容使用 [*] 表示。

./id
./name
./created
./modified
./description
./created_by_ref
./external_references
./external_references/[*]/source_name
./external_references/[*]/url
./x_opencti_base_score
./x_opencti_base_severity
./x_opencti_attack_vector
./x_opencti_integrity_impact
./x_opencti_availability_impact
./type
./spec_version

2. 实现代码

# -*- coding: utf-8 -*-
"""
创建时间:2021/04/29
原创作者: 郝伟老师
功能简介: 根据输入的 nvd2020.json 分析所有唯一路径,从而理解Json文件的格式。
"""
import json

def load_json(json_filepath):
    ''' 
    作用:从文件中加载Json数据,返回dict对象 
    参数:
          json_filepath  输入的json数据文件的路径。
    返回:json内存数据格式
    '''
    data = []
    with open(json_filepath, 'r', encoding='utf-8') as load_f: 
         data =  json.load(load_f)
    return data

def analyze1(json_data, paths, cur_path):
    '''
    作用:根据输入的json_data格式的数据,进行递归分析每个key的完整路径。其中 list 使用 [*] 表示。
    参数:
          json_data  json 数据
          paths      唯一不重复的完整路径
          cur_path   当前路径
    '''
    if isinstance(json_data, dict):
        for key in json_data:
            new_path='/'.join(cur_path) + '/' + key
            if paths.count(new_path) == 0:
                paths.append(new_path)
            cur_path.append(key)
            analyze1(json_data[key], paths, cur_path)
            del cur_path[-1]
    elif isinstance(json_data, list):
          cur_path.append('[*]')
          for i in range(len(json_data)):
              analyze1(json_data[i], paths, cur_path)
          del cur_path[-1]
    return (paths, cur_path)

def analyze(json_data):
    paths=[]
    cur_path=['.']
    paths, cur_path = analyze1(json_data, paths, cur_path)
    return paths

filepath=r'c:\data\nvd2020.json'
filepath=r'c:\data\demo1.json'
jdata = load_json(filepath)
for path in analyze(jdata):
    print(path)

3. 应用演示

比如对 nvd2020.rar (大小为1.4MB, 解压后17.6MB) 进行分析,可以得到以下内容:

./objects
./objects/[*]/name
./objects/[*]/identity_class
./objects/[*]/type
./objects/[*]/spec_version
./objects/[*]/id
./objects/[*]/created
./objects/[*]/modified
./objects/[*]/description
./objects/[*]/created_by_ref
./objects/[*]/external_references
./objects/[*]/external_references/[*]/source_name
./objects/[*]/external_references/[*]/url
./objects/[*]/x_opencti_base_score
./objects/[*]/x_opencti_base_severity
./objects/[*]/x_opencti_attack_vector
./objects/[*]/x_opencti_integrity_impact
./objects/[*]/x_opencti_availability_impact
./type
./id

可以虽然文件体积比较大,但是其结构非常简单,根是一个 只有 id, type 和 objects 三个主键。而其中最大的对象是 objects, 它是一个列表有若干个对象。每个对象有 name, identity_calss, type, spc_version, id, created, modified, description, created_by_ref, external_references, x_opencti_base_score, x_opencti_base_severity, x_opencti_attack_vector, x_opencti_integrity_impact, x_opencti_availability_impact 这些属性。其中 external_references 也是列表,其列表中的对象对象的属性有 source_name 和 url。

4. 附: Python 数据类型相关

以下内容摘自此文:python中判断变量的类型

# 返回变量类型
def getType(variate):
    arr = { "int":"整数",
            "float":"浮点",
            "str":"字符串",
            "list":"列表",
            "tuple":"元组",
            "dict":"字典",
            "set":"集合"
    }
    vartype = typeof(variate)
    if not (vartype in arr):
        return "未知类型"
    return arr[vartype]

results matching ""

    No results matching ""