从Json读取数据再输出为CSV示例 郝伟 2021/01/21 [TOC]

1. 任务简介

从Json文件中读取数据,然后输出为CSV

输入文件 input.json,内容为:

{
    "ip":"192.168.101.28",
    "state":"open",
    "product":"Windows"
}

中间数据为二维数组,内容为:

rows = [["ip", "192.168.101.28"], 
 ["state", "open"],
 ["product", "Windows"]]

输出CSV格式,内容为

Key,Value
ip,192.168.101.28
state,open
product,Windows

2. 源代码

处理函数如下所示。

# -*- coding: utf-8 -*-
"""
Created on Thu Jan 21 18:02:37 2021

@author: hao
"""

import json
import csv

'''
# 基于pandas的,但是会有标题和序列号
import pandas as pd
def save_as_csv(array,path):
    """将数组保存在.csv文件中"""
    data = pd.DataFrame(array)
    data.to_csv(path)
'''

def save_array_to_csv(array, csvfile):
  with open(csvfile,'w')as f:
      f_csv = csv.writer(f)
      f_csv.writerows(array)

def save_array_to_csv1(array, csvfile):
    with open(csvfile, 'w') as f:
        for item in array:
            f.write(','.join(item) + '\n')

def load_json(jsonfile):
    data = []
    with open(jsonfile, 'r') as load_f:
         data =  json.load(load_f)
    return data

data = load_json("d:\\data\\input.json")

rows = [['Key', 'Value']]
if isinstance(data, dict):
    for key in data:
        rows.append([key, data[key].strip()]) 

print(rows)    


save_array_to_csv1(rows, "d:\\data\\output.csv")

3. 性能测试

为了测试读写性能,编写以下代码分析读取和内存性能,测试文件: 这里(709.64MB).

数据结构只有两层,结构如下所示:

# 测试代码
jdata = load_json(r'd:\data\test.json')
for key in jdata:
    print('# of {0}:\t{1}.'.format(key, len(jdata[key])))

# 返回结果
Number of mode: 6.
Number of vertices:     425025.
Number of edges:        1956492.    

# test.json 数据结构 
.
├── mode [6]           # 6个元素的列表
|
├── vertices[425025]   # 425,025个元素的列表
|
└── edges [1956492]    # 1,956,492个元素的列表

注:测试环境为: AMD3700, 32GB, NVME512GB

ID      Time(s)   Speed(MB/s)    Memory(GB)
1        8.18       86.74         2.68
2        9.47       74.91         2.74
3        10.04      70.65         2.72
4        10.17      69.76         2.74
5        9.59       74.00         2.73
6        9.51       74.59         2.76
7        9.43       75.29         2.73
8        9.97       71.18         2.75
9        10.24      69.28         2.73
10       9.50       74.66         2.75

综合可见,文件读取速度大约70~80 (MB/s),内存占用约为2.7GB,由于默认会占用0.1G,所以实际占用约为2.6GB,每MB的Json数据大约占3.7MB内存。当然这个要看具体的文件,对其他文件由于结构不同,区别会很大。

源代码:

# -*- coding: utf-8 -*-
"""
Created on Sat Jan 23 09:30:23 2021

@author: pc
"""

import time
import json
import psutil
import os


def load_json(jsonfile):
    data = []
    # 注意,如果不写encoding,默认编码会用GBK
    with open(jsonfile, 'r', encoding='utf-8') as load_f: 
         data =  json.load(load_f)
    return data

file1 = ['test.json', 709.64]

#%%time 
print('ID\tTime(s)\tSpeed(MB/s)\tMemory(GB)')
for i in range(10):
    t1=time.time()    
    jdata = load_json(file1[0])
    t2=time.time()
    t = t2 - t1 + 0.001
    speed = file1[1] / t
    mem = psutil.Process(os.getpid()).memory_info().rss / 1024 / 1024 / 1024
    print(u'{0}\t{1:.2f}\t{2:.2f}\t{3:.4f}'.format(i + 1, t, speed, mem))

results matching ""

    No results matching ""