JSON to PCAP
Pcap Masking / Anonymization
Script which can be used to reconstruct pcap and perform packet modifications from tshark json output. Script is also allowing to perform pcap anonymization.
Source code is located on github https://github.com/H21lab/json2pcap
The above repository can contain more recent changes compared to Wireshark repository (https://gitlab.com/wireshark/wireshark/-/blob/master/tools/json2pcap/json2pcap.py).
Command tshark -T json -x or -T jsonraw output adds into hex-data output in JSON also the information on which position each field is dissected in the original frame, what is the field length, the bitmask (for not byte aligned fields) and the type. This information can be used for latter processing. One use-case is the json2pcap script included in wireshark, which assembles the protocol layers back together from upper to lowers layers. This allows revers json to pcap conversion and also the packet modification/editing/rewriting.
usage: json2pcap.py [-h] [--version] [-i [INFILE]] -o OUTFILE [-p] [-m MASKED_FIELD] [-a ANONYMIZED_FIELD] [-s SALT] [-v]
json2pcap 1.2
Utility to generate pcap from json format.
Packet modification:In input json it is possible to modify the raw values of decoded fields.The output pcap will include the modified values. The algorithm ofgenerating the output pcap is to get all raw hex fields from input json andthen assembling them by layering from longest (less decoded fields) toshortest (more decoded fields). It means if the modified raw field isshorter field (more decoded field) it takes precedence against modificationin longer field (less decoded field). If the json includes duplicated rawfields with same position and length, the behavior is not deterministic.For manual packet editing it is always possible to remove any not requiredraw fields from json, only frame_raw is field mandatory for reconstruction.
Packet modification with -p switch:The python script is generated instead of pcap. This python script whenexecuted will generate the pcap of 1st packet from input json. Thegenerated code includes the decoded fields and the function to assembly thepacket. This enables to modify the script and programmatically edit orencode the packet variables. The assembling algorithm is different, becausethe decoded packet fields are relative and points to parent node with theirposition (compared to input json which has absolute positions).
Pcap masking and anonymization with -m and -a switch:The script allows to mask or anonymize the selected json raw fields. If theThe fields are selected and located on lower protocol layers, they are notThe overwritten by upper fields which are not marked by these switches.The pcap masking and anonymization can be performed in the following way:
tshark -r orig.pcap -T json -x --no-duplicate-keys | \ python json2pcap.py-m "ip.src_raw" -a "ip.dst_raw" -o anonymized.pcapIn this example the ip.src_raw field is masked with ffffffff by byte valuesand ip.dst_raw is hashed by randomly generated salt.
Additionally the following syntax is valid to anonymize portion of fieldtshark -r orig.pcap -T json -x --no-duplicate-keys | \ python json2pcap.py-m "ip.src_raw[2:]" -a "ip.dst_raw[:-2]" -o anonymized.pcapWhere the src_ip first byte is preserved and dst_ip last byte is preserved.And the same can be achieved bytshark -r orig.pcap -T json -x --no-duplicate-keys | \ python json2pcap.py-m "ip.src_raw[2:8]" -a "ip.dst_raw[0:6]" -o anonymized.pcap
Masking and anonymization limitations are mainly the following:- In case the tshark is performing reassembling from multiple frames, thebackward pcap reconstruction is not properly performed and can result inmalformed frames.- The new values in the fields could violate the field format, as thejson2pcap is no performing correct protocol encoding with respect toallowed values of the target field and field encoding.
optional arguments: -h, --help show this help message and exit --version show program's version number and exit -i [INFILE], --infile [INFILE] json generated by tshark -T json -x or by tshark -T jsonraw (not preserving frame timestamps). If no inpout file is specified script reads from stdin. -o OUTFILE, --outfile OUTFILE output pcap filename -p, --python generate python payload instead of pcap (only 1st packet) -m MASKED_FIELD, --mask MASKED_FIELD mask the specific raw field (e.g. -m "ip.src_raw" -m "ip.dst_raw[2:6]") -a ANONYMIZED_FIELD, --anonymize ANONYMIZED_FIELD anonymize the specific raw field (e.g. -a "ip.src_raw[2:]" -a "ip.dst_raw[:-2]") -s SALT, --salt SALT salt use for anonymization. If no value is provided it is randomized. -v, --verbose verbose output
Pcap anonymization
Pcap anonymization can be performed in the following way:
By -a switch should be specified all fields which require anonymization. These fields will be replaced by hex 0xFF in the output pcap. For identification of the names of the raw fields it is possible to open the json file which is generated by -T jsonraw option.
Anonymization example for SIP protocol
1. Download original SIP_CALL_RTP_G711 pcap from wireshark pcap samples
2. Run the following command
tshark -Y sip -r ./SIP_CALL_RTP_G711 -T json -x --no-duplicate-keys | python json2pcap.py -a "ip.src_raw" -a "ip.dst_raw" -a "sip.from.user_raw" -a "sip.from.user_raw" -a "sip.to.user_raw" -a "sip.contact.uri_raw" -a "sip.contact.user_raw" -a "sip.r-uri.user_raw" -a "sip.display.info_raw" --salt "iSaiU7Y6biYxAEeVbP77" -o ./SIP_CALL_RTP_G711_anonymized.pcap3. This will produce SIP_CALL_RTP_G711_anonymized.pcap as seen on the screenshots.
Original SIP_CALL_RTP_G711.pcap
SIP_CALL_RTP_G711_anonymized.pcap
Pcap modification example
1. Download original dns.cap from wireshark pcap samples
2. Create json from pcap
tshark -T jsonraw -J "dns" -r dns.cap > dns.cap.json3. Modify dns.cap.json
vi dns.cap.json{ "_index": "packets-2017-02-27", "_type": "pcap_file", "_score": null, "_source": { "layers": { "frame_raw": ["00c09f32418c00e018b10cad0800450000380000400040116547c0a8aa08c0a8aa14801b0035002485ed10320100000100000000000006676f6f676c6503636f6d0000100001", 0, 70, 0, 1], "frame": { "filtered": "frame" }, "eth_raw": ["00c09f32418c00e018b10cad0800", 0, 14, 0, 1], "eth": { "filtered": "eth" }, "ip_raw": ["450000380000400040116547c0a8aa08c0a8aa14", 14, 20, 0, 1], "ip": { "filtered": "ip" }, "udp_raw": ["801b0035002485ed", 34, 8, 0, 1], "udp": { "filtered": "udp" }, "dns_raw": ["10320100000100000000000006676f6f676c6503636f6d0000100001", 42, 28, 0, 1], "dns": { "dns.id_raw": ["1032", 42, 2, 0, 5], "dns.flags_raw": ["0100", 44, 2, 0, 5], "dns.flags_tree": { "dns.flags.response_raw": ["0", 44, 2, 32768, 2], "dns.flags.opcode_raw": ["0", 44, 2, 30720, 5], "dns.flags.truncated_raw": ["0", 44, 2, 512, 2], "dns.flags.recdesired_raw": ["1", 44, 2, 256, 2], "dns.flags.z_raw": ["0", 44, 2, 64, 2], "dns.flags.checkdisable_raw": ["0", 44, 2, 16, 2] }, "dns.count.queries_raw": ["0001", 46, 2, 0, 5], "dns.count.answers_raw": ["0000", 48, 2, 0, 5], "dns.count.auth_rr_raw": ["0000", 50, 2, 0, 5], "dns.count.add_rr_raw": ["0000", 52, 2, 0, 5], "Queries": { "google.com: type TXT, class IN": { "dns.qry.name_raw": ["0667676767676703636f6d00", 54, 12, 0, 26],4. json2pcap.py to generate pcap
The raw fields are flatten and by this the frame is created. The shortest raw fields overwrite the longer fields.
./wireshark/tools/json2pcap/json2pcap.py -p dns.cap.json5. (OPTIONAL) json2pcap.py to generate python payload instead of pcap.
This is with relative positions to the parents. And the pcap reconstruct algorithm is different by using the parents and child hierarchy. This can be useful for possible latter more complex encoding (e.g. using scapy, dpkt, pycrate/libmich or other libraries).
./wireshark/tools/json2pcap/json2pcap.py -p dns.cap.json6. New pcap dns.cap.json.pcap