从Json读取数据再输出为CSV示例
郝伟 2021/01/21
从Json文件中读取数据,然后输出为CSV
输入文件 input.json,内容为:
{ "ip":"192.168.101.28", "state":"open", "product":"Windows" }
中间数据为二维数组,内容为:
rows = [["ip", "192.168.101.28"], ["state", "open"], ["product", "Windows"]]
输出CSV格式,内容为
Key,Value ip,192.168.101.28 state,open product,Windows
处理函数如下所示。
# -*- coding: utf-8 -*- """ Created on Thu Jan 21 18:02:37 2021 @author: hao """ import json import csv ''' # 基于pandas的,但是会有标题和序列号 import pandas as pd def save_as_csv(array,path): """将数组保存在.csv文件中""" data = pd.DataFrame(array) data.to_csv(path) ''' def save_array_to_csv(array, csvfile): with open(csvfile,'w')as f: f_csv = csv.writer(f) f_csv.writerows(array) def save_array_to_csv1(array, csvfile): with open(csvfile, 'w') as f: for item in array: f.write(','.join(item) + '\n') def load_json(jsonfile): data = [] with open(jsonfile, 'r') as load_f: data = json.load(load_f) return data data = load_json("d:\\data\\input.json") rows = [['Key', 'Value']] if isinstance(data, dict): for key in data: rows.append([key, data[key].strip()]) print(rows) save_array_to_csv1(rows, "d:\\data\\output.csv")
为了测试读写性能,编写以下代码分析读取和内存性能,测试文件: 这里(709.64MB).
数据结构只有两层,结构如下所示:
# 测试代码 jdata = load_json(r'd:\data\test.json') for key in jdata: print('# of {0}:\t{1}.'.format(key, len(jdata[key]))) # 返回结果 Number of mode: 6. Number of vertices: 425025. Number of edges: 1956492. # test.json 数据结构 . ├── mode [6] # 6个元素的列表 | ├── vertices[425025] # 425,025个元素的列表 | └── edges [1956492] # 1,956,492个元素的列表
注:测试环境为: AMD3700, 32GB, NVME512GB
ID Time(s) Speed(MB/s) Memory(GB) 1 8.18 86.74 2.68 2 9.47 74.91 2.74 3 10.04 70.65 2.72 4 10.17 69.76 2.74 5 9.59 74.00 2.73 6 9.51 74.59 2.76 7 9.43 75.29 2.73 8 9.97 71.18 2.75 9 10.24 69.28 2.73 10 9.50 74.66 2.75
综合可见,文件读取速度大约70~80 (MB/s),内存占用约为2.7GB,由于默认会占用0.1G,所以实际占用约为2.6GB,每MB的Json数据大约占3.7MB内存。当然这个要看具体的文件,对其他文件由于结构不同,区别会很大。
源代码:
# -*- coding: utf-8 -*- """ Created on Sat Jan 23 09:30:23 2021 @author: pc """ import time import json import psutil import os def load_json(jsonfile): data = [] # 注意,如果不写encoding,默认编码会用GBK with open(jsonfile, 'r', encoding='utf-8') as load_f: data = json.load(load_f) return data file1 = ['test.json', 709.64] #%%time print('ID\tTime(s)\tSpeed(MB/s)\tMemory(GB)') for i in range(10): t1=time.time() jdata = load_json(file1[0]) t2=time.time() t = t2 - t1 + 0.001 speed = file1[1] / t mem = psutil.Process(os.getpid()).memory_info().rss / 1024 / 1024 / 1024 print(u'{0}\t{1:.2f}\t{2:.2f}\t{3:.4f}'.format(i + 1, t, speed, mem))