nmap抓取的数据的匹配问题
郝伟,刘加勇 2020/12/26
从抓取到的nmapscan.txt读取信息,然后在字典pocs.txt中进行比对,如果匹配成功,则将返回匹配到的内容的键值。
从nmapscan.txt中取出每一条,然后再取出每一条中的 name、product、port的值,使用nmapscan.txt中的name和product字段匹配pocs.json中的rule字段,使用nmapscan.txt的port匹配pocs.json中的port的值,如果有一个命中,则取出命中的poc名称,即key值,例如“Apache Solr_Unauthorized”,“CouchDB_Unauthorized”
输入文件只有1个文件 nmapscan.txt,其内容是多条json格式的记录(即每条就是单独的json作为一个输入),例如:
{"ip":"192.168.101.28", "ipinfo":{"ports": {"135": {"state": "open", "name": "msrpc", "product": "Microsoft Windows RPC", "version": "", "extrainfo": ""}, "139": {"state": "open", "name": "netbios-ssn", "product": "Microsoft Windows netbios-ssn", "version": "", "extrainfo": ""}, "445": {"state": "open", "name": "microsoft-ds", "product": "Microsoft Windows 2003 or 2008 microsoft-ds", "version": "", "extrainfo": ""}, "1025": {"state": "open", "name": "msrpc", "product": "Microsoft Windows RPC", "version": "", "extrainfo": ""}}, "system": "Windows", "vendor": "Microsoft"}} {"ip":"192.168.101.29", "ipinfo":{"ports": {"22": {"state": "open", "name": "ssh", "product": "OpenSSH", "version": "8.1p1 Debian 1", "extrainfo": "protocol 2.0"}}, "system": "Linux", "vendor": "Linux"}} {"ip":"192.168.101.26", "ipinfo":{"ports": {"22": {"state": "open", "name": "ssh", "product": "OpenSSH", "version": "7.6p1 Ubuntu 4ubuntu0.3", "extrainfo": "Ubuntu Linux; protocol 2.0"}, "5000": {"state": "open", "name": "http", "product": "Tornado httpd", "version": "6.1", "extrainfo": ""}, "5001": {"state": "open", "name": "http", "product": "BaseHTTPServer", "version": "0.6", "extrainfo": "Python 4.8.5"}, "9999": {"state": "open", "name": "http", "product": "Tornado httpd", "version": "6.1", "extrainfo": ""}, "27017": {"state": "open", "name": "mongodb", "product": "MongoDB", "version": "4.2.3", "extrainfo": ""}}, "system": "Linux", "vendor": "Linux"}}
提供一个在线工具:可以对json字符串进行格式这里。
pocs.json,是要匹配的目标
{
"Apache Solr_Unauthorized": {
"rule": "Solr,Apache Solr",
"port": "8983",
"path": "component/apache solr/Apache Solr_Unauthorized.un",
"create": "system"
},
"CouchDB_Unauthorized": {
"rule": "CouchDB",
"port": "5984",
"path": "component/couchdb/CouchDB_Unauthorized.un",
"create": "system"
},
"Docker_Unauthorized": {
"rule": "docker",
"port": "2375",
"path": "component/docker/Docker_Unauthorized.un",
"create": "system"
},
"Domino_Unauthorized": {
"rule": "domino",
"port": "1352",
"path": "component/domino/Domino_Unauthorized.un",
"create": "system"
},
"Elastic_Unauthorized": {
"rule": "ElasticSearch",
"port": "9200",
"path": "component/elasticsearch/Elastic_Unauthorized.un",
"create": "system"
},
"Esccms_Unauthorized": {
"rule": "esccms",
"port": "80",
"path": "cms/esccms/Esccms_Unauthorized.un",
"create": "system"
},
在匹配时,要求尽可能多和精准的匹配,请看以下三个示例:
"1521": {
"state": "open",
"name": "oracle-tns",
"product": "Oracle TNS listener",
"version": "11.2.0.2.0",
"extrainfo": "unauthorized"
如果按照name和product匹配:匹配的结果是:Oracle-GlassFish-Server-Open-Source-Edition_DirectTraver
但是查看pocs.json文件中Oracle-GlassFish-Server-Open-Source-Edition_DirectTraver的信息,会发现它的端口是80,不是1521,所以最终的匹配结果是空,如果匹配到Oracle-GlassFish-Server-Open-Source-Edition_DirectTraver,就是不精准
"389": { "state": "open", "name": "ldap", "product": "Microsoft Windows Active Directory LDAP", "version": "", "extrainfo": "Domain: huaun.com, Site: Default-First-Site-Name" }, 最后的匹配结果:是pocs.json中的“LDAP_Unauthorized”,不含windows相关的poc
"8009": { "state": "open", "name": "ajp13", "product": "Apache Jserv", "version": "", "extrainfo": "Protocol v1.3" },
最终匹配的结果是pocs.json中的“Apache-Tomcat-Ajp_File-Read”
总结:由于nmapcan.txt的内容会随着用户环境的变化而变化,后面我会尽量多的提供,pocscan.json基本是不变的。
在程序执行正常的情况下,能够命中的匹配过低,几乎匹配不到几条记录。
首先,从pocs.txt和nmapscan.txt中读取关键要匹配的内容,即
通过附录1的代码,对两个示例文件进行提取,可以获得数据如上(注 pocs的key和nmap中的port未打印输出):
****************************** pocs: rule (20 of 817) ****************************** Solr,Apache Solr CouchDB docker domino ElasticSearch esccms Hadoop influxDB jboss jenkins joomla mongodb Redis rsync Zookeeper Memcached activemq axis cerio gtafana ****************************** nmap: name, product (total: 5) ****************************** ssh, OpenSSH http, Tornado httpd http, BaseHTTPServer http, Tornado httpd mongodb, MongoDB
经观察可以发现,两者在匹配时,由于字符串不完全相等,导致难以匹配成功。比如第24行,product的内容为Tornado httpd,是两个单词的组合,而对于第25行,内容为BaseHTTPServer,所以在匹配时其内容无法与字典中的内容完全相等,从而导致匹配失败。
根据以上情况,提出了两种解决方案如下。
核心思想是将输入内容拆分成单词列表,然后进行逐一匹配,具体步骤如下:
httpd。具体的实现方法是建立一个过滤词列表,把不需要的词都放在这个列表中,然后在匹配时,只要输入的内容包括在过滤词列表中,就直接过滤掉我们还可以使用字符串相似度计算的方法在匹配时,对两个不完全一样的字符串进行匹配,根据返回的相似度值,设定一个阀值,选择超过阀值的匹配。字符串相似度计算有很多方法,以下给出两种方法进行参考:
(理论部分:略)
def jsc(str1, str2): ''' :type str1: 第1个字符串 :type str2: 第2个字符串 :rtype: float 两者的相似度,值在[0,1]区间上 ''' # 统计所有出现的字符数量 all_chars = [] for ch in str1: if ch not in all_chars: all_chars.append(ch) for ch in str2: if ch not in all_chars: all_chars.append(ch) # 统计公共的字符数量 common_chars = [] for ch in all_chars: if ch in str1 and ch in str2: common_chars.append(ch) # 返回两者的比值作为相似度 return 1. * len(common_chars) / len(all_chars)
调用方法
str1 = "hello" strs = ["hello", " hello", "hello1", " hello1","hel1lo", ""] for str2 in strs: print(str1 + " : " + str2, " = ", jsc(str1, str2))
执行以后可以得到以下的结果:
hello : hello = 1.0
hello : hello = 0.8
hello : hello1 = 0.8
hello : hello1 = 0.6666666666666666
hello : hel1lo = 0.8
hello : = 0.0
(理论部分:略)
直接上代码:
直接上代码:
def jdcompare(str1, str2): ''' :type str1: 第1个字符串 :type str2: 第2个字符串 :rtype: float 两者的相似度,值在[0,1]区间上 ''' m = len(str1) # 第1个字符串长度 n = len(str2) # 第2个字符串长度 # 如果有任意一个长度为0,则相似度为0. if(m == 0): return 0. if(n == 0): return 0. # 初始化矩阵 matrix = [[i + j for j in range(n + 1)] for i in range(m + 1)] # 根据第3步的算法进行两层循环计算 for i in range(1, m + 1): for j in range(1, n + 1): t = 0 if str1[i - 1] == str2[j - 1] else 1 v1 = matrix[i - 1][j] + 1 v2 = matrix[i][j - 1] + 1 v3 = matrix[i - 1][j - 1] + t matrix[i][j] = min(v1, v2, v3) # 结果进行了计算,返回的不再是距离,而是相似度 return 1.0 - 1.0 * matrix[len(str1)][len(str2)] / max(len(str1), len(str2))
以下是使用和效果
str1 = "hello" strs = ["hello", " hello", "hello1", " hello1","hel1lo", ""] for str2 in strs: print(str1 + " : " + str2, " = ", compare(str1, str2))
执行以后可以得到以下的结果:
hello : hello = 1.0
hello : hello = 0.8333333333333334
hello : hello1 = 0.8333333333333334
hello : hello1 = 0.7142857142857143
hello : hel1lo = 0.8333333333333334
hello : = 0.0
此库也有比较功能,示例如下:
import difflib as df def compare(s1, s2): return df.SequenceMatcher(None, s1, s2).quick_ratio() print(string_compare("hello", "hellO1")) # 输出为: # 0.7272727272727273
使用相同的代码如下:
str1 = "hello" strs = ["hello", " hello", "hello1", " hello1","hel1lo", ""] for str2 in strs: print(str1 + " : " + str2, " = ", compare(str1, str2)) 执行以后可以得到以下的结果:
hello : hello = 1.0
hello : hello = 0.9090909090909091
hello : hello1 = 0.9090909090909091
hello : hello1 = 0.8333333333333334
hello : hel1lo = 0.9090909090909091
hello : = 0.0
在分词时,有时候会有一些无分隔词的情况,比如BaseHTTPServer。通常有两种解决方法:基于字典的和基于自然语言处理的。由于第二种方法比较麻烦,所以推荐使用基于字典的方法。基于字典的方法的原理是这样:建立一个字典,里面有一些常见的合在一起的拼写,只要有就根据字符进行拆分。
举例来说,字典为 dic={'Base', ...},那么在处理BaseHTTPServer时,就会提取到Base,然后剩余的内容就变成了 HTTPServer,如果字典中还有HTTP或Server,同样进行提取。
根据测试显示,单一的通过方案2的相似度估算,有时候结果不正确。根据刘加勇的测试结果:
使用以下数据测试时,Microsoft IIS httpd 和 IIS 并不能匹配最高。
str1s = ["Microsoft IIS httpd", "Oracle WebLogic Server"] str2s = ["WebLogic-Deserialization_RCE(CVE-2019-2725)", "Weblogic-XMLDecoder_RCE", "Weblogic-WLS_RCE(CVE-2020-2551)", "Weblogic-Wls9_RCE(CVE-2019-2729)", "Weblogic-WLS_RCE(CVE-2018-2893)", "Weblogic-WLS_RCE(CVE-2018-2628)", "Weblogic-WLS-Core_RCE", "Weblogic_SSRF", "IIS"]
/Users/yong/Projects/tensorflow/venv/bin/python /Users/yong/Projects/test/杰卡德相似系数评估方法.py Microsoft IIS httpd : WebLogic-Deserialization_RCE(CVE-2019-2725) = 0.15789473684210525 Microsoft IIS httpd : Weblogic-XMLDecoder_RCE = 0.2222222222222222 Microsoft IIS httpd : Weblogic-WLS_RCE(CVE-2020-2551) = 0.125 Microsoft IIS httpd : Weblogic-Wls9_RCE(CVE-2019-2729) = 0.125 Microsoft IIS httpd : Weblogic-WLS_RCE(CVE-2018-2893) = 0.11764705882352941 Microsoft IIS httpd : Weblogic-WLS_RCE(CVE-2018-2628) = 0.12121212121212122 Microsoft IIS httpd : Weblogic-WLS-Core_RCE = 0.2 Microsoft IIS httpd : Weblogic_SSRF = 0.18181818181818182 Microsoft IIS httpd : IIS = 0.14285714285714285 Oracle WebLogic Server : WebLogic-Deserialization_RCE(CVE-2019-2725) = 0.3235294117647059 Oracle WebLogic Server : Weblogic-XMLDecoder_RCE = 0.4166666666666667 Oracle WebLogic Server : Weblogic-WLS_RCE(CVE-2020-2551) = 0.37037037037037035 Oracle WebLogic Server : Weblogic-Wls9_RCE(CVE-2019-2729) = 0.27586206896551724 Oracle WebLogic Server : Weblogic-WLS_RCE(CVE-2018-2893) = 0.3448275862068966 Oracle WebLogic Server : Weblogic-WLS_RCE(CVE-2018-2628) = 0.35714285714285715 Oracle WebLogic Server : Weblogic-WLS-Core_RCE = 0.55 Oracle WebLogic Server : Weblogic_SSRF = 0.5 Oracle WebLogic Server : IIS = 0.0625 Process finished with exit code 0
/Users/yong/Projects/tensorflow/venv/bin/python /Users/yong/Projects/test/编辑距离算法.py Microsoft IIS httpd : WebLogic-Deserialization_RCE(CVE-2019-2725) = 0.09302325581395354 Microsoft IIS httpd : Weblogic-XMLDecoder_RCE = 0.04347826086956519 Microsoft IIS httpd : Weblogic-WLS_RCE(CVE-2020-2551) = 0.06451612903225812 Microsoft IIS httpd : Weblogic-Wls9_RCE(CVE-2019-2729) = 0.09375 Microsoft IIS httpd : Weblogic-WLS_RCE(CVE-2018-2893) = 0.06451612903225812 Microsoft IIS httpd : Weblogic-WLS_RCE(CVE-2018-2628) = 0.06451612903225812 Microsoft IIS httpd : Weblogic-WLS-Core_RCE = 0.04761904761904767 Microsoft IIS httpd : Weblogic_SSRF = 0.10526315789473684 Microsoft IIS httpd : IIS = 0.1578947368421053 Oracle WebLogic Server : WebLogic-Deserialization_RCE(CVE-2019-2725) = 0.06976744186046513 Oracle WebLogic Server : Weblogic-XMLDecoder_RCE = 0.13043478260869568 Oracle WebLogic Server : Weblogic-WLS_RCE(CVE-2020-2551) = 0.032258064516129004 Oracle WebLogic Server : Weblogic-Wls9_RCE(CVE-2019-2729) = 0.0625 Oracle WebLogic Server : Weblogic-WLS_RCE(CVE-2018-2893) = 0.032258064516129004 Oracle WebLogic Server : Weblogic-WLS_RCE(CVE-2018-2628) = 0.032258064516129004 Oracle WebLogic Server : Weblogic-WLS-Core_RCE = 0.13636363636363635 Oracle WebLogic Server : Weblogic_SSRF = 0.36363636363636365 Oracle WebLogic Server : IIS = 0.045454545454545414 Process finished with exit code 0
/Users/yong/Projects/tensorflow/venv/bin/python /Users/yong/Projects/test/difflib库.py Microsoft IIS httpd : WebLogic-Deserialization_RCE(CVE-2019-2725) = 0.22580645161290322 Microsoft IIS httpd : Weblogic-XMLDecoder_RCE = 0.3333333333333333 Microsoft IIS httpd : Weblogic-WLS_RCE(CVE-2020-2551) = 0.16 Microsoft IIS httpd : Weblogic-Wls9_RCE(CVE-2019-2729) = 0.1568627450980392 Microsoft IIS httpd : Weblogic-WLS_RCE(CVE-2018-2893) = 0.16 Microsoft IIS httpd : Weblogic-WLS_RCE(CVE-2018-2628) = 0.16 Microsoft IIS httpd : Weblogic-WLS-Core_RCE = 0.3 Microsoft IIS httpd : Weblogic_SSRF = 0.25 Microsoft IIS httpd : IIS = 0.2727272727272727 Oracle WebLogic Server : WebLogic-Deserialization_RCE(CVE-2019-2725) = 0.4 Oracle WebLogic Server : Weblogic-XMLDecoder_RCE = 0.5777777777777777 Oracle WebLogic Server : Weblogic-WLS_RCE(CVE-2020-2551) = 0.37735849056603776 Oracle WebLogic Server : Weblogic-Wls9_RCE(CVE-2019-2729) = 0.2962962962962963 Oracle WebLogic Server : Weblogic-WLS_RCE(CVE-2018-2893) = 0.37735849056603776 Oracle WebLogic Server : Weblogic-WLS_RCE(CVE-2018-2628) = 0.37735849056603776 Oracle WebLogic Server : Weblogic-WLS-Core_RCE = 0.5581395348837209 Oracle WebLogic Server : Weblogic_SSRF = 0.5142857142857142 Oracle WebLogic Server : IIS = 0.08 Process finished with exit code 0
三种方法都未能预期,实际上即使修改算法,也未必保保证一定完成任务。造成这一问题的根本原因是,出现的情况很多,有规则的也有无规则的文本,所以不能使用一种算法解决所有的问题。所以,针对这个问题,建议采取多种算法结合的方式,如先使用正则表达式和方案一,解决一些规则的数据,然后再通过方案二的相似度计算,解决剩下的问题。可以在一定程度上解决问题。
本段代码主要功能是从 pocs.json和nmap.txt文件中提取指定的内容,其中第10-17行用于返回文件中所有行,27-30提取procs中的所有port,key和rule,32-39行提取nmap中的所有内容,最后在第41-46行进行输出。
# -*- coding: utf-8 -*- """ Created on Fri Dec 25 21:31:52 2020 @author: hao """ import json def read_lines(filepath): ''' loads and returns all text from a file. ''' lines = [] with open(filepath, 'r', encoding='utf-8') as f: lines = f.readlines() return lines # file1 and file2 store poc data and nmap data, respectively. file1=r'd:\data\pocs.json' file2=r'd:\data\nmapscan.txt' # read and construct a json data pocs=json.loads(''.join(read_lines(file1))) # represnts pros.json in json poc_rules=[] for item in pocs: data1 = pocs[item] poc_rules.append((data1['port'], data1['rule'])) # port, key, rule # represents the lineno-th line of nmap file in json lineno = 3 nmaps_ports = json.loads(read_lines(file2)[lineno])['ipinfo']['ports'] # retreives port, name and product from "nmap.ipinfo.ports" nmaps_name_product = [] for item in nmaps_ports: # item is port, use 'name' & 'product' to meet 'rule' nmaps_name_product.append((item, nmaps_ports[item]['name'], nmaps_ports[item]['product'])) # show two types of data showcount=20 # print first showcount of the total thousands of lines data print('{0} pocs: rule ({1} of {2}) {0} '.format('*' * 30, showcount, len(poc_rules))) for item in poc_rules[0:showcount]: print(item[1]) print('{0} nmap: name, product (total: {1}) {0}'.format('*' * 30, len(nmaps_name_product))) for item in nmaps_name_product: print(item[1] + ', ' + item[2])