从Json读取数据再输出为CSV示例 郝伟 2021/01/21 [TOC]
1. 任务简介
从Json文件中读取数据,然后输出为CSV
输入文件 input.json,内容为:
{
"ip":"192.168.101.28",
"state":"open",
"product":"Windows"
}
中间数据为二维数组,内容为:
rows = [["ip", "192.168.101.28"],
["state", "open"],
["product", "Windows"]]
输出CSV格式,内容为
Key,Value
ip,192.168.101.28
state,open
product,Windows
2. 源代码
处理函数如下所示。
# -*- coding: utf-8 -*-
"""
Created on Thu Jan 21 18:02:37 2021
@author: hao
"""
import json
import csv
'''
# 基于pandas的,但是会有标题和序列号
import pandas as pd
def save_as_csv(array,path):
"""将数组保存在.csv文件中"""
data = pd.DataFrame(array)
data.to_csv(path)
'''
def save_array_to_csv(array, csvfile):
with open(csvfile,'w')as f:
f_csv = csv.writer(f)
f_csv.writerows(array)
def save_array_to_csv1(array, csvfile):
with open(csvfile, 'w') as f:
for item in array:
f.write(','.join(item) + '\n')
def load_json(jsonfile):
data = []
with open(jsonfile, 'r') as load_f:
data = json.load(load_f)
return data
data = load_json("d:\\data\\input.json")
rows = [['Key', 'Value']]
if isinstance(data, dict):
for key in data:
rows.append([key, data[key].strip()])
print(rows)
save_array_to_csv1(rows, "d:\\data\\output.csv")
3. 性能测试
为了测试读写性能,编写以下代码分析读取和内存性能,测试文件: 这里(709.64MB).
数据结构只有两层,结构如下所示:
# 测试代码
jdata = load_json(r'd:\data\test.json')
for key in jdata:
print('# of {0}:\t{1}.'.format(key, len(jdata[key])))
# 返回结果
Number of mode: 6.
Number of vertices: 425025.
Number of edges: 1956492.
# test.json 数据结构
.
├── mode [6] # 6个元素的列表
|
├── vertices[425025] # 425,025个元素的列表
|
└── edges [1956492] # 1,956,492个元素的列表
注:测试环境为: AMD3700, 32GB, NVME512GB
ID Time(s) Speed(MB/s) Memory(GB)
1 8.18 86.74 2.68
2 9.47 74.91 2.74
3 10.04 70.65 2.72
4 10.17 69.76 2.74
5 9.59 74.00 2.73
6 9.51 74.59 2.76
7 9.43 75.29 2.73
8 9.97 71.18 2.75
9 10.24 69.28 2.73
10 9.50 74.66 2.75
综合可见,文件读取速度大约70~80 (MB/s),内存占用约为2.7GB,由于默认会占用0.1G,所以实际占用约为2.6GB,每MB的Json数据大约占3.7MB内存。当然这个要看具体的文件,对其他文件由于结构不同,区别会很大。
源代码:
# -*- coding: utf-8 -*-
"""
Created on Sat Jan 23 09:30:23 2021
@author: pc
"""
import time
import json
import psutil
import os
def load_json(jsonfile):
data = []
# 注意,如果不写encoding,默认编码会用GBK
with open(jsonfile, 'r', encoding='utf-8') as load_f:
data = json.load(load_f)
return data
file1 = ['test.json', 709.64]
#%%time
print('ID\tTime(s)\tSpeed(MB/s)\tMemory(GB)')
for i in range(10):
t1=time.time()
jdata = load_json(file1[0])
t2=time.time()
t = t2 - t1 + 0.001
speed = file1[1] / t
mem = psutil.Process(os.getpid()).memory_info().rss / 1024 / 1024 / 1024
print(u'{0}\t{1:.2f}\t{2:.2f}\t{3:.4f}'.format(i + 1, t, speed, mem))