2503.04404v2

Model: nemotron-free

# Temporal Analysis of NetFlow Datasets for Network Intrusion Detection Systems (2025) Abstract This paper investigates the temporal analysis of NetFlow datasets for machine learning (ML)-based network intrusion detection systems (NIDS). Although many previous studies have highlighted the critical role of temporal features, such as inter-packet arrival time and flow length/duration, in NIDS, the currently available NetFlow datasets for NIDS lack these temporal features. This study addresses this gap by creating and making publicly available a set of NetFlow datasets that incorporate these temporal features [1]. With these temporal features, we provide a comprehensive temporal analysis of NetFlow datasets by examining the distribution of various features over time and presenting time-series representations of NetFlow features. This temporal analysis has not been previously provided in the existing literature. We also borrowed an idea from signal processing, time frequency analysis, and tested it to see how different the time frequency signal presentations (TFSPs) are for various attacks. The results indicate that many attacks have unique patterns, which could help ML models to identify them more easily. 1 Introduction Maintaining the security and integrity of network infrastructures has become increasingly challenging due to the constantly evolving nature of cyber threats and the vast scale and complexity of modern networks. A critical component of network security is monitoring traffic, which provides essential information on potential threats, anomalies, and vulnerabilities. However, the overwhelming volume of network traffic has made traditional packet inspection impractical, demanding immense processing power and storage resources while simultaneously raising significant privacy concerns [2]. A practical solution adopted by many organisations to address these challenges is to implement flow-based network monitoring [3]. This approach aggregates traffic into summarised flows, capturing key communication patterns between endpoints, allowing for efficient analysis, reduced resource demands, and improved privacy protection while still enabling robust threat detection and network management [4]. Network Intrusion Detection Systems (NIDS) are a vital component of the network security ecosystem, providing real-time monitoring and analysis of network traffic to identify suspicious activities, unauthorised access attempts, and potential security breaches [5]. NIDSs are commonly classified into two main types: signature-based and anomaly-based systems [6]. Signature-based NIDSs rely on databases of known attack signatures, requiring regular updates [7]. They achieve high accuracy for recognised attacks but face challenges with their variations, polymorphic malware, and zero-day exploits [8]. In contrast, anomaly-based NIDSs utilise advanced algorithms to learn from traffic patterns, enabling them to adapt to emerging threats and detect anomalies that deviate from normal behaviour [9]. To enhance detection capabilities, many modern NIDSs integrate machine learning (ML) techniques, improving both anomaly-based and hybrid approaches [6, 10]. The integration of trained ML models into NIDS is referred to as ML-based NIDS [11]. ML-based NIDSs are trained to learn patterns in network traffic and enhance anomaly detection by distinguishing between normal and malicious behaviour [12, 13]. However, their effectiveness heavily depends on the quality and relevance of the datasets used for training and evaluation [14]. In this context, flow-based network monitoring provides a practical solution by summarising traffic into flows, offering a structured representation of network activity that facilitates both training and real-time anomaly detection. Yet, a significant challenge in using current flow-based benchmark datasets lies in their inconsistent feature sets, which hinder uniform analysis across them. Each dataset typically presents a unique set of features, complicating the task of comparing and evaluating ML models across different datasets [15]. Sarhan et al. addressed this gap by introducing a NetFlow version of four highly cited flow-based benchmark datasets, standardised to a common feature set [16, 17]. NetFlow is the most widely used format for collecting flow information in real-world production networks [18]. Although these NetFlow datasets [16, 17] have addressed the gap in standardised feature sets, they lack most temporal NetFlow features. Consequently, they fall short when employing sequential neural network models or leveraging temporal network traffic to identify attacks. The inclusion of detailed temporal information in NIDS datasets significantly enhances our ability to analyse traffic patterns and detect anomalies associated with different network attacks [19]. This research bridges this gap by introducing a new NetFlow version of four common NIDS benchmark datasets: UNSW-NB15 [20], BoT-IoT [21], ToN-IoT [22], and CSE-CIC-IDS2018 [23], which incorporate temporal NetFlow features. These new versions are publicly available and can be accessed via [1]. The details of the temporal features and other specifications of these datasets are discussed in Section 4. Upon providing these datasets [1], we investigate their temporal characteristics through multiple analytical approaches. First, we perform a detailed analysis of flow duration distribution to illustrate the temporal patterns associated with each class of network behaviour within the datasets. Similarly, we examine the distribution of inter-arrival times (IAT) to reveal patterns distinctive to each traffic category. Second, we employ time series representations to dynamically track network activities over time. These visualisations effectively highlight specific attack periods alongside normal traffic flow patterns. Then, both numerical and categorical features are visualised within these representations. Finally, we apply Time-Frequency Distribution (TFD) representation to explore the frequency components of traffic data over time. Inspired by [24, 25] work in activity recognition, where TFD successfully identified subtle activity patterns [24, 25], we hypothesise that network attacks might also exhibit unique TFD signatures. TFD has been actively used in NIDS, where network traffic is transformed into image formats analysed by convolutional neural networks (CNN) for effective attack classification [26, 27]. Although our initial investigations have not yet yielded definitive results, they suggest promising directions for future research, potentially leading to breakthroughs in how network attacks are detected and classified. By conducting a thorough analysis of the network’s behaviour through NetFlow datasets, we lay a foundational understanding of their network dynamics. This step is crucial as it provides insights into the typical traffic patterns and interactions within the network, fostering a human-level understanding of network behaviours. Such insights are instrumental in designing more targeted and effective strategies for network monitoring and anomaly detection, even without directly engaging in the development or evaluation of machine learning models [28]. Our main contributions in this work are outlined as follows: - Comprehensive Temporal Analysis of Network Traffic: We conduct an extensive temporal analysis to demonstrate the evolving dynamics of network traffic and security threats. Through detailed visualisations, including traffic distribution patterns, flow length distributions per attack class, and time-frequency domain representations, we provide novel insights into network behaviour, advancing the understanding of temporal aspects in network security. - Public Release of NetFlow-Based Datasets with Temporal Features: We convert four widely used benchmark NIDS datasets into the NetFlow format, incorporating temporal features that were previously absent in available NetFlow-based benchmark datasets. These enhancements standardise the dataset format, ensuring consistency for machine learning model evaluation, and significantly improve their utility in temporal analysis, leading to more accurate anomaly detection. Moreover, we make these enriched NetFlow datasets publicly available, providing a valuable resource for the research community to support ongoing advancements in machine learning based network intrusion detection. The structure of the paper is as follows: Section 2 reviews related work, Section 3 describes the NF3 datasets, Section 4 presents the temporal analysis, and Section 5 concludes the paper with future work directions. 2 Related Works Dataset analysis is essential to understand the strengths and limitations of different NIDS datasets. Recent studies [29] and [30], have surveyed and compared publicly available NIDS datasets. These analyses highlight their diverse characteristics and limitations, noting that the quality of a dataset can significantly impact the performance of detection models. For instance, some datasets do not accurately mirror real-world network scenarios, thereby affecting the reliability of the research conducted using them. In one case, the traffic patterns of NetFlow datasets are directly compared with real-world traffic, identifying significant discrepancies in statistical features between synthetic and actual datasets [31]. However, the comparison overlooks the analysis of malicious flows and does not address the temporal dynamics of network interactions. Similarly, authors in [32] focused on the complexity of inputs between real-world and lab-based traffic but stopped short of extending this analysis to temporal sequences, which are essential for uncovering deeper behavioural insights. Further, researchers in [5] have explored how dataset characteristics influence NIDS performance, underlining the critical role of careful dataset selection. Their citation-based analysis highlights the popularity of various NIDS approaches, guiding future research directions in the field. Additionally, [15] provides a thorough review of methodologies for evaluating NIDS models and stresses the importance of testing and evaluating these models across multiple datasets to ensure their robustness and applicability. Aligning with this recommendation, our work enriches the field by equipping four widely recognised NIDS datasets with standardised NetFlow features. To elaborate on their role as benchmarks, recent studies have focused on understanding normal traffic patterns in NIDS datasets to enhance anomaly detection capabilities. Studies such as [33, 34, 35] attempt to understand the normal traffic at a level that any deviation will be detected as a suspicious threat. The authors in [33] demonstrated the necessity of monitoring the traffic features distributions as it can be a good proof of anomalies. In their study, they work with collected network data with injected anomalies and they found that these anomalies fall into distinct clusters. The authors in [34] highlighted the advantages of using entropy-based approaches for anomaly detection. Their investigation focused on both flow header and behavioural features and it demonstrated a strong correlation between entropy values, which offers comparable effectiveness in detecting anomalies. [35] proposed a network traffic modelling based on analysing the source-destination flows in a network. Another significant body of work concentrates on analysing specific traffic features to gain deeper insights into network behaviours. For example, some work focus on analysing flow length features, as it offers deep insights into network traffic behaviour and is a focal point of extensive research [36, 37]. The studies in [38, 39, 40] were in elephant flow detection, which refers to the process of identifying large, long-lived network flows that consume a significant amount of bandwidth. Typically, benign traffic exhibits a certain range of flow lengths depending on the application protocols and user behaviour patterns. In contrast, malicious traffic, such as that generated by attacks like port scanning, DoS attacks, or data exfiltration, often shows distinct flow length characteristics that deviate from the norm [41]. Additionally, a number of studies emphasise the significance of the IAT feature, alongside other crucial flow characteristics, for effective monitoring of traffic patterns [42, 43]. The work in [42] analysed the traffic characteristics, including IAT, across ten diverse data centre networks across various administrative domains including universities, enterprises, and cloud service providers. This analysis was aimed at understanding the distinct traffic patterns and the underlying dynamics of these data centres by meticulously examining both flow and packet-level attributes associated with different layer-7 applications. Meanwhile, the authors in [43] extend this analysis by examining the distribution of key traffic features. Their data collection methodology encompassed three levels of network monitoring: SNMP counters for basic metrics, sampled flow or packet header data for more granular insights, and deep packet inspection for detailed content analysis. While the primary focus of the study was on evaluating network traffic volume and identifying congestion, it also covered various other traffic patterns, including server interactions, flow metrics, and bandwidth usage. Despite the proven benefits of temporal analysis in these fields, NetFlow data has not been extensively explored in this regard. Regarding the standard flow format like NetFlow [44], temporal analysis remains under-explored. Studies have explored the effectiveness of sequential learning models, such as Long Short-Term Memory (LSTM), in extracting temporal characteristics from NetFlow data for NIDS [45]. Some researchers adopted the CNN and LSTM models simultaneously to construct a hybrid model [46, 47]. CNN is mainly used to extract spatial features and has made many computer vision applications remarkable [48]. In [46, 47], the authors introduce the Spatial and Temporal Aware Intrusion Detection Model (STIDM). STIDM is a spatio-temporal feature extraction model designed to analyse IAT features between consecutive packets. This model employs a well-known CNN architecture, LeNet-5, for extracting spatial features, complemented by a modified LSTM to capture temporal patterns. While this method allows for grouping packets into flows, it does not effectively facilitate the determination of broader temporal patterns across NetFlow data, making the exploration of temporal dependencies at the NetFlow level unfeasible. The authors in [45] explore temporal sequences of network traffic flows that denote patterns of malicious activities. The main focus was not to compete with the state-of-the-art solutions but rather to find specific temporal patterns, if exist, for each attack class. The paper investigates the use of LSTM neural networks to learn temporal patterns in network flows for NIDS and compares the performance of the LSTM to a static Feed-forward Neural Network (FNN) model. Their goal is similar to ours but we are more interested in understanding the temporal aspect at the feature level within NetFlow datasets. Building on these initial forays into temporal NetFlow analysis, our research aims to provide a deep understanding of the temporal features in NetFlow datasets. We specifically focus on the temporal dynamics of these datasets without the direct intention of developing new anomaly detection models. Instead, our objective is to enrich the analytical tools available for network security, providing insights that are crucial for the real-time detection and analysis of network anomalies. By making these enriched datasets publicly available, we also contribute to the broader research community, offering resources that enable more detailed and effective analysis of network behaviours. 3 NIDS Datasets High-quality datasets are essential for the effective evaluation and development of ML-NIDS systems [14]. Historical datasets such as KDD Cup 99 and NSL-KDD, while once foundational, have become less relevant due to their outdated attack patterns from the late 1990s and early 2000s [49]. The evolving nature of cyber threats highlights the necessity for up-to-date datasets that mirror current network environments and attack patterns [20]. This ensures that ML models are evaluated against current challenges and tailored to address emerging cybersecurity threats, enhancing their effectiveness and relevance. This paper uses four contemporary datasets for this purpose, each providing a rich source of network traffic data reflecting current network environments: - UNSW-NB15 [20]: Developed by the Cyber Range Lab of the Australian Centre for Cyber Security (ACCS) using the IXIA PerfectStorm tool to create a mix of normal and malicious traffic, including 12 synthetic attack scenarios. - BoT-IoT [21]: Also created by ACCS, this dataset includes a comprehensive mix of benign and malicious traffic covering five types of attack scenarios. - ToN-IoT [22]: A heterogeneous dataset encompassing telemetry data of IoT services and operating system logs, designed to assist in the development and evaluation of NIDSs. This dataset was also created by ACCS and it contains 9 attack classes. - CSE-CIC-IDS2018 [23]: Released by a collaboration between the Communications Security Establishment (CSE) and the Canadian Institute for Cybersecurity (CIC), this dataset focuses on simulating realistic network traffic combined with non-overlapping attacks. Despite their utility for single dataset evaluation, the inconsistency in feature sets across various datasets makes it challenging to ensure fair and reliable evaluations of ML-NIDS models. [15]. To address this gap, previous efforts have standardised these datasets to a unified NetFlow format [16, 17], enhancing their usability for consistent model evaluation. The authors identified 43 features that were most effective in classifying attack classes in the datasets. Table 1 shows the full set of features used in the last NetFlow datasets [17] and also the missing features proposed in this version (in bold), which will be explained in the next section. Table 1: List of the proposed standard NetFlow features and the added temporal features | IPV4_SRC_ADDR | IPv4 source address | | --- | --- | | IPV4_DST_ADDR | IPv4 destination address | | L4_SRC_PORT | IPv4 source port number | | L4_DST_PORT | IPv4 destination port number | | PROTOCOL | IP protocol identifier byte | | L7_PROTO | Application protocol (numeric) | | IN_BYTES | Incoming number of bytes | | OUT_BYTES | Outgoing number of bytes | | IN_PKTS | Incoming number of packets | | OUT_PKTS | Outgoing number of packets | | FLOW_DURATION_MILLISECONDS | Flow duration in milliseconds | | TCP_FLAGS | Cumulative of all TCP flags | | CLIENT_TCP_FLAGS | Cumulative of all client TCP flags | | SERVER_TCP_FLAGS | Cumulative of all server TCP flags | | DURATION_IN | Client to Server stream duration (msec) | | DURATION_OUT | Client to Server stream duration (msec) | | MIN_TTL | Min flow TTL | | MAX_TTL | Max flow TTL | | LONGEST_FLOW_PKT | Longest packet (bytes) of the flow | | SHORTEST_FLOW_PKT | Shortest packet (bytes) of the flow | | MIN_IP_PKT_LEN | Len of the smallest flow IP packet observed | | MAX_IP_PKT_LEN | Len of the largest flow IP packet observed | | SRC_TO_DST_SECOND_BYTES | Src to dst Bytes/sec | | DST_TO_SRC_SECOND_BYTES | Dst to src Bytes/sec | | RETRANSMITTED_IN_BYTES | Number of retransmitted TCP flow bytes (src- $>$ dst) | | RETRANSMITTED_IN_PKTS | Number of retransmitted TCP flow packets (src- $>$ dst) | | RETRANSMITTED_OUT_BYTES | Number of retransmitted TCP flow bytes (dst- $>$ src) | | RETRANSMITTED_OUT_PKTS | Number of retransmitted TCP flow packets (dst- $>$ src) | | SRC_TO_DST_AVG_THROUGHPUT | Src to dst average thpt (bps) | | DST_TO_SRC_AVG_THROUGHPUT | Dst to src average thpt (bps) | | NUM_PKTS_UP_TO_128_BYTES | Packets whose IP size $<$ = 128 | | NUM_PKTS_128_TO_256_BYTES | Packets whose IP size $>$ 128 and $<$ = 256 | | NUM_PKTS_256_TO_512_BYTES | Packets whose IP size $>$ 256 and $<$ = 512 | | NUM_PKTS_512_TO_1024_BYTES | Packets whose IP size $>$ 512 and $<$ = 1024 | | NUM_PKTS_1024_TO_1514_BYTES | Packets whose IP size $>$ 1024 and $<$ = 1514 | | TCP_WIN_MAX_IN | Max TCP Window (src- $>$ dst) | | TCP_WIN_MAX_OUT | Max TCP Window (dst- $>$ src) | | ICMP_TYPE | ICMP Type * 256 + ICMP code | | ICMP_IPV4_TYPE | ICMP Type | | DNS_QUERY_ID | DNS query transaction Id | | DNS_QUERY_TYPE | DNS query type (e.g., 1=A, 2=NS..) | | DNS_TTL_ANSWER | TTL of the first A record (if any) | | FTP_COMMAND_RET_CODE | FTP client command return code | | FLOW_START_MILLISECONDS | Flow start timestamp in milliseconds | | FLOW_END_MILLISECONDS | Flow end timestamp in milliseconds | | SRC_TO_DST_IAT_MIN | Minimum IAT (src- $>$ dst) | | SRC_TO_DST_IAT_MAX | Maximum IAT (src- $>$ dst) | | SRC_TO_DST_IAT_AVG | Average IAT (src- $>$ dst) | | SRC_TO_DST_IAT_STDDEV | Standard deviation of IAT (src- $>$ dst) | | DST_TO_SRC_IAT_MIN | Minimum IAT (dst- $>$ src) | | DST_TO_SRC_IAT_MAX | Maximum IAT (dst- $>$ src) | | DST_TO_SRC_IAT_AVG | Average IAT (dst- $>$ src) | | DST_TO_SRC_IAT_STDDEV | Standard deviation of IAT (dst- $>$ src) | 4 NetFlow Datasets version 3 This section introduces NF3-Datasets, the third iteration of NetFlow-based datasets converted from the four aforementioned datasets [20, 23, 22, 21]. These conversions standardise the representation of network flows, enabling consistent cross-dataset analysis and facilitating advanced intrusion detection research. The selection of features extracted from the original datasets was rigorously assessed in the previous version [50]; consequently, the current datasets retain the established feature set while also enriching them by adding time-related features, as explained below. 4.1 Temporal Features As can be seen in Table 1, the list of features included in this version is the same as the previous version [17] plus the temporal features. The added features provide a temporal dimension for network traffic analysis, facilitating the precise identification and correlation of events over time. The temporal features listed can be classified into two categories: “Flow Timing” for determining the start and end time of each flow in milliseconds format, and “Inter-Packet Arrival Time” for including various statistics of the arrival times between consecutive packets in a flow. Flow timing enables researchers to accurately sequence network flows, ensuring that data aggregation and analysis reflect the true dynamics of network interactions. In the datasets, these timing values are stored in Unix timestamp format, which represents the number of milliseconds elapsed since January 1, 1970 (UTC). Precise timing is critical for activities such as event correlation, where understanding the order and duration of flows can reveal patterns indicative of coordinated attacks or system anomalies. Inter-packet Arrival Time (IAT) serves as another crucial metric, offering valuable insights into the dynamics of network traffic. IAT is calculated as the time interval between the arrival of consecutive packets at a network device, either from source to destination or vice versa. To accurately capture this metric, each packet’s timestamp is recorded upon arrival, and the difference between consecutive timestamps is computed. These time differences are then used to calculate the minimum, maximum, average, and standard deviation for each flow. Although these metrics originate from packet-level observations, they are aggregated at the flow level to provide a more comprehensive view of traffic patterns. Through a detailed examination of the IAT over time, we can gain comprehensive insights into the behaviour of traffic flows. Researchers are attracted to these features because they can uncover subtle deviations from normal traffic patterns [33, 42, 43], providing a deeper layer of analysis that enhances the detection of both sophisticated and low-profile network attacks. <details> <summary>2503.04404v2/extracted/6264149/Process.png Details</summary> ![f6c212da20f3ac5ecf5425a24579e66bd86a5e9fd27cb8cdacd950669f66dc37](http://localhost:8000/v1/image/f6c212da20f3ac5ecf5425a24579e66bd86a5e9fd27cb8cdacd950669f66dc37) ### Visual Description # Technical Document Extraction: Network Flow Processing Workflow ## Diagram Description The image depicts a sequential data processing workflow for network traffic analysis. The flowchart consists of six interconnected components with directional arrows indicating data flow. All components are represented as dark blue rounded rectangles with white text labels. ## Component Analysis 1. **PCAP files** (Leftmost component) - Input source for the workflow - Contains raw network packet capture data 2. **nProbe** (Second component) - Network monitoring tool - Processes PCAP files using Defined Features - Output: Unlabelled NetFlow dataset 3. **Defined Features** (Top component) - Configuration parameters for nProbe - Positioned above nProbe with downward arrow connection 4. **NetFlow dataset (Unlabelled)** (Third component) - Intermediate output from nProbe - Contains processed network flow data without classification 5. **Labelling process** (Fourth component) - Classification stage - Receives input from: - Unlabelled NetFlow dataset - Ground Truth File (via bidirectional arrow) - Output: Final Labelled dataset 6. **Ground Truth File** (Bottom component) - Reference dataset for supervised learning - Connected bidirectionally to Labelling process 7. **Final NetFlow dataset (Labelled)** (Rightmost component) - End product of the workflow - Contains classified network flow data ## Flowchart Structure - Horizontal progression from left to right - Vertical connection between Ground Truth File and Labelling process - All components connected by single-direction arrows except: - Bidirectional connection between Ground Truth File and Labelling process ## Textual Content All components contain the following text labels: 1. "PCAP files" 2. "nProbe" 3. "Defined Features" 4. "NetFlow dataset (Unlabelled)" 5. "Labelling process" 6. "Ground Truth File" 7. "Final NetFlow dataset (Labelled)" ## Process Flow 1. PCAP files → nProbe (using Defined Features) 2. nProbe → Unlabelled NetFlow dataset 3. Unlabelled dataset + Ground Truth File → Labelling process 4. Labelling process → Final Labelled dataset ## Technical Notes - No numerical data or quantitative metrics present - All connections use standard flowchart arrow notation - Color scheme: Dark blue (#003366) background with white text - No legends, axis titles, or numerical scales present - No secondary languages detected ## Workflow Purpose This diagram illustrates a supervised machine learning pipeline for network traffic classification, where raw PCAP data is processed through nProbe, enriched with defined features, and then classified using ground truth references to produce a labeled dataset for further analysis. </details> Figure 1: Illustration of the Dataset Conversion and Labeling Process 4.2 Conversion Methodology The providers of the original datasets [20, 23, 22, 21] have released their source files in various formats enabling researchers to adapt and utilise these datasets according to specific research needs and to address known limitations. As seen in [16, 17], this flexibility aids in mitigating the feature divergence gap found in NIDS datasets by allowing for the regeneration of datasets with a standardised feature set in NetFlow format. The process of generating the current version of the NetFlow datasets is the same as previous versions [16, 17], displayed in Figure 1. The implementation was conducted on a machine running Ubuntu 20.04 LTS equipped with nProbe software. The nProbe is developed by Ntop [51], and is specifically designed to process and convert raw network traffic into the NetFlow records. As can be seen in Figure 1, the workflow initiates with the acquisition of the PCAP files, which are publicly available for each dataset on their respective official websites. Given the extensive volume of data, significant storage capacity is required; for instance, the CSE_CIC_IDS2018 dataset [23] alone comprises more than 4,000 PCAP files, totalling over 400 gigabytes. Once collected, the PCAP files undergo conversion through the following nProbe command invocation: nprobe -i file.pcap -V 9 --dont-reforge-time -T %feature1%feature2%featureN --dump-path <path> --dump-format t --csv-separator ’#’ In the above command, the -i option specifies the input file, -V 9 sets the NetFlow version to 9, and --dont-reforge-time preserves the original timestamps of the network traffic, ensuring the timing data are not modified to match the time of command execution. The --dump-path option defines the directory for the output files, --dump-format t selects the text file format for the output, and --csv-separator ’#’ is used to separate the columns with a ’#’ in the resulting files. This configuration extracts 57 different flow features using the -T option, organising them according to the specified criteria. The outputs generated from executing the nProbe command are a series of text files that chronologically catalogue all flow data with precise temporal information. Then, the text files are seamlessly merged and converted into CSV format, facilitating easy reading and efficient organisation of the datasets. By this stage, we have compiled four datasets containing detailed flow information. These datasets are not yet labelled, which means there is no differentiation between normal and malicious flows, nor identification of specific types of attacks within the malicious flows. The subsequent phase involves labelling each flow based on the comparison with the corresponding ground truth file. Labelling is refined by comparing the precise timestamps and 5-tuple identifiers (Source/Destination IP, Source/Destination Ports, Protocol) to accurately match flows with their respective ground truth labels. The purpose of the labelling stage is to augment the datasets with two columns: one for binary classification and another for multi-class classification. In the binary column, a label of 0 signifies a benign flow, while a label of 1 denotes a malicious flow. The summary of binary labelling is depicted in Table 2. On the other hand, the multi-class classification column encapsulates the specific type of attack, as documented in the ground truth files, allowing for a granular analysis of threat types. Detailed statistics regarding the distribution of attack classes within the datasets are presented in Table 3. Table 2: Summary of Malicious and Benign Flows in NF3-Datasets | NF3-UNSW-NB15 NF3-CSE-CIC-IDS2018 NF3-ToN-IoT | 127,693(5.40%) 2,600,903(12.93%) 10,728,046 (38.98%) | 2,237,731(94.60%) 17,514,626(87.07%) 16,792,214(61.02%) | 2,365,424 20,115,529 27,520,260 | | --- | --- | --- | --- | | NF3-BoT-IoT | 16,881,819(99.7%) | 51,989(0.3%) | 16,933,808 | Table 3: Statistics of attack types across the datasets, showing the count of flows categorised under each attack and benign class. | Benign DoS DDoS | 2,237,731 5,980 — | 17,514,626 302,966 1,324,350 | 16,792,214 203,456 4,141,256 | 51,989 8,034,190 7,150,882 | | --- | --- | --- | --- | --- | | Reconnaissance | 17,074 | — | — | 1,695,132 | | Backdoor | 1,226 | — | 203,384 | — | | Fuzzers | 33,816 | — | — | — | | Exploits | 42,748 | — | — | — | | Analysis | 2,381 | — | — | — | | Generic | 19,651 | — | — | — | | Shellcode | 4,659 | — | — | — | | Worms | 158 | — | — | — | | Web Attacks | — | 2,538 | — | — | | Infiltration | — | 188,152 | — | — | | BoT | — | 207,703 | — | — | | BrutForce | — | 575,194 | — | — | | Scanning | — | — | 1,358,977 | — | | XSS | — | — | 2,834,435 | — | | Password | — | — | 1,594,777 | — | | Injection | — | — | 381,777 | — | | Ransomware | — | — | 3,971 | — | | MITM | — | — | 6,013 | — | | Theft | — | — | — | 1,615 | | Total | 2,850,806 | 20,115,529 | 27,520,260 | 16,881,819 | The resultant of labelled datasets are the four finalised datasets that we propose in this paper, designated as NF3-UNSW-NB15, NF3-BoT-IoT, NF3-ToN-IoT, and NF3-CSE-CIC-IDS2018. All four datasets share the same feature set, which allows for better evaluation and comparison when implementing and evaluating ML-NIDS models. The inclusion of timestamp information allows for identifying the exact time of the traffic when the original traffic was captured. It is worth mentioning that the timestamps included in the datasets represent the time stamps documented in their respective PCAP files, not the time stamps at which the data was converted to the NetFlow format. This distinction ensures that the temporal integrity of the original network conditions is preserved in the datasets. Following this dataset preparation, the next section will delve into the temporal analysis of these datasets. This analysis aims to explore the dynamic patterns and temporal characteristics of the traffic, providing deeper insights into the timing and progression of the recorded network behaviour. 5 Temporal Analysis Gaining a human-level understanding of network traffic is essential before moving on to predictive modelling [52]. By incorporating temporal information into the NetFlow datasets, we can apply various temporal analysis methods to gain deeper insights into network behaviour. As mentioned in the related work section, many studies have explored network attack patterns over time [45]. However, unlike approaches that often aim at classification, this work focuses primarily on the temporal analysis at the feature level within NetFlow datasets. This analysis is not aimed at classifying or predicting specific types of network attacks but rather seeks to deepen our understanding of the inherent temporal characteristics of network features. In this section, we analyse NetFlow datasets from multiple perspectives, aiming to uncover insights into the dynamics of network traffic. <details> <summary>2503.04404v2/x1.png Details</summary> ![057cbb0670fa0e748b4b5a6297248878e7f223097b1570e4c9d04b4f15478b6e](http://localhost:8000/v1/image/057cbb0670fa0e748b4b5a6297248878e7f223097b1570e4c9d04b4f15478b6e) ### Visual Description # Technical Document Extraction: Flow Length Frequency Analysis ## Image Description The image is a **stacked bar chart** visualizing the distribution of network flow categories across different flow lengths (in seconds). The chart uses a **logarithmic y-axis** (Frequency) and a **linear x-axis** (Flow Length). Each bar is segmented by color to represent five categories: Theft, Reconnaissance, DDoS, DoS, and Benign. --- ## Key Components ### 1. **Axis Labels and Markers** - **X-Axis (Flow Length):** - Title: *"Flow Length (Seconds)"* - Range: 0 to 120 seconds (increments of 5 seconds). - Labels: 0, 5, 10, ..., 120. - **Y-Axis (Frequency):** - Title: *"Frequency"* - Scale: Logarithmic (10⁰ to 10⁸). - Labels: 10⁰, 10², 10⁴, 10⁶, 10⁸. ### 2. **Legend** - **Location:** Top-left corner. - **Categories and Colors:** - **Theft:** Pink - **Reconnaissance:** Purple - **DDoS:** Blue - **DoS:** Red - **Benign:** Green --- ## Data Trends and Observations ### 1. **Theft (Pink)** - **Trend:** Dominates **short flow lengths (0–20 seconds)**. - **Peak Frequency:** ~10⁶ at 0 seconds. - **Decline:** Rapidly decreases after 20 seconds, becoming negligible beyond 40 seconds. ### 2. **Reconnaissance (Purple)** - **Trend:** Secondary to Theft in short flows (0–20 seconds). - **Peak Frequency:** ~10⁵ at 0 seconds. - **Decline:** Gradual reduction after 20 seconds, fading by 40 seconds. ### 3. **DDoS (Blue)** - **Trend:** Peaks in **mid-range flows (50–70 seconds)**. - **Peak Frequency:** ~10⁷ at 60 seconds. - **Distribution:** Moderate presence in shorter flows (0–40 seconds), declines after 80 seconds. ### 4. **DoS (Red)** - **Trend:** Peaks in **mid-range flows (60–80 seconds)**. - **Peak Frequency:** ~10⁶ at 70 seconds. - **Distribution:** Moderate in shorter flows (0–50 seconds), declines after 90 seconds. ### 5. **Benign (Green)** - **Trend:** Dominates **long flow lengths (100–120 seconds)**. - **Peak Frequency:** ~10⁷ at 120 seconds. - **Distribution:** Minimal in short flows (0–50 seconds), increases steadily after 80 seconds. --- ## Spatial Grounding and Color Verification - **Legend Colors Match Bars:** - Pink (Theft) segments are tallest in 0–20s. - Blue (DDoS) segments dominate 50–70s. - Red (DoS) segments peak at 70s. - Green (Benign) segments grow tallest at 120s. --- ## Component Isolation - **Main Chart:** Stacked bars with logarithmic y-axis. - **Legend:** Top-left, no overlapping elements. - **No Footer/Header:** Chart focuses solely on data visualization. --- ## Conclusion The chart reveals distinct behavioral patterns for network flow categories: - **Short flows (0–40s):** Dominated by Theft and Reconnaissance. - **Mid-range flows (50–90s):** DDoS and DoS are most frequent. - **Long flows (100–120s):** Benign traffic prevails. No textual data tables or embedded diagrams are present. All information is derived from axis labels, legend, and bar segmentation. </details> (a) <details> <summary>2503.04404v2/x2.png Details</summary> ![13ca1da4667e002013e83b5d41268a7b211ee99b49ba040029cbea760b9d52ab](http://localhost:8000/v1/image/13ca1da4667e002013e83b5d41268a7b211ee99b49ba040029cbea760b9d52ab) ### Visual Description # Technical Document Extraction: Flow Length Frequency Analysis ## Chart Description The image is a **stacked bar chart** visualizing the frequency distribution of network flow lengths (in seconds) across different attack types and benign traffic. The y-axis represents **Frequency** on a **logarithmic scale** (10⁰ to 10⁸), while the x-axis represents **Flow Length (Seconds)** ranging from 0 to 120. Each bar is segmented into colored regions corresponding to specific categories, as defined in the legend. --- ## Legend and Color Mapping The legend is located in the **top-right corner** of the chart. It maps colors to the following categories: - **Gray**: Web-Attack - **Yellow**: Infiltration - **Orange**: BoT (Bot) - **Purple**: BruteForce - **Red**: DoS (Denial of Service) - **Blue**: DDoS (Distributed Denial of Service) - **Green**: Benign **Spatial Grounding**: - The legend is positioned at the **top-right** of the chart, outside the main plotting area. - All bar segments strictly align with their corresponding legend colors (e.g., green bars = Benign, red bars = DoS). --- ## Key Trends and Data Points ### 1. **Benign Traffic (Green)** - **Dominant Category**: Green bars consistently occupy the **largest portion** of most flow lengths. - **Peak Frequency**: Highest frequency observed at **0–5 seconds** (frequency ~10⁶). - **Trend**: Gradual decline in frequency as flow length increases, with a slight resurgence at **110–115 seconds**. ### 2. **Web-Attack (Gray)** - **Low Frequency**: Gray bars are **minimal** across most flow lengths. - **Notable Spike**: A significant increase at **85 seconds** (frequency ~10³). ### 3. **Infiltration (Yellow)** - **High Frequency**: Yellow bars are **prominent** in the **0–20 second** range. - **Peak Frequency**: ~10⁴ at **0–5 seconds**, declining sharply after 20 seconds. ### 4. **BoT (Bot) (Orange)** - **Moderate Frequency**: Orange bars are **smaller** than yellow but larger than purple. - **Peak Frequency**: ~10³ at **15–20 seconds**. ### 5. **BruteForce (Purple)** - **Low Frequency**: Purple bars are **rare**, mostly below 10². - **Trend**: Minimal presence across all flow lengths. ### 6. **DoS (Red)** - **Moderate Frequency**: Red bars are **consistent** but smaller than green. - **Peak Frequency**: ~10³ at **50–60 seconds**. ### 7. **DDoS (Blue)** - **Moderate Frequency**: Blue bars are **smaller** than red but larger than orange. - **Peak Frequency**: ~10³ at **30–40 seconds**. --- ## Component Isolation ### Header - **Legend**: Top-right corner, clearly labeled with category names and colors. ### Main Chart - **X-Axis**: Flow Length (Seconds) from 0 to 120. - **Y-Axis**: Frequency (logarithmic scale: 10⁰ to 10⁸). - **Bars**: Stacked segments representing attack types and benign traffic. ### Footer - No additional text or labels. --- ## Data Table Reconstruction | Flow Length (Seconds) | Web-Attack | Infiltration | BoT | BruteForce | DoS | DDoS | Benign | |-----------------------|------------|--------------|-----|------------|-----|------|--------| | 0 | 10³ | 10⁴ | 10² | 10¹ | 10² | 10¹ | 10⁶ | | 5 | 10² | 10⁴ | 10² | 10¹ | 10² | 10¹ | 10⁶ | | 10 | 10¹ | 10³ | 10² | 10¹ | 10² | 10¹ | 10⁵ | | 15 | 10⁰ | 10³ | 10² | 10¹ | 10² | 10¹ | 10⁵ | | 20 | 10⁰ | 10² | 10² | 10¹ | 10² | 10¹ | 10⁴ | | 25 | 10¹ | 10³ | 10² | 10¹ | 10² | 10¹ | 10⁴ | | 30 | 10⁰ | 10² | 10² | 10¹ | 10² | 10¹ | 10⁴ | | 35 | 10⁰ | 10² | 10² | 10¹ | 10² | 10¹ | 10³ | | 40 | 10⁰ | 10² | 10² | 10¹ | 10² | 10¹ | 10³ | | 45 | 10⁰ | 10² | 10² | 10¹ | 10² | 10¹ | 10³ | | 50 | 10⁰ | 10² | 10² | 10¹ | 10² | 10¹ | 10³ | | 55 | 10¹ | 10³ | 10² | 10¹ | 10² | 10¹ | 10³ | | 60 | 10⁰ | 10² | 10² | 10¹ | 10² | 10¹ | 10³ | | 65 | 10⁰ | 10² | 10² | 10¹ | 10² | 10¹ | 10³ | | 70 | 10⁰ | 10² | 10² | 10¹ | 10² | 10¹ | 10³ | | 75 | 10⁰ | 10² | 10² | 10¹ | 10² | 10¹ | 10³ | | 80 | 10¹ | 10³ | 10² | 10¹ | 10² | 10¹ | 10³ | | 85 | 10³ | 10² | 10² | 10¹ | 10² | 10¹ | 10³ | | 90 | 10⁰ | 10² | 10² | 10¹ | 10² | 10¹ | 10³ | | 95 | 10⁰ | 10² | 10² | 10¹ | 10² | 10¹ | 10³ | | 100 | 10⁰ | 10² | 10² | 10¹ | 10² | 10¹ | 10³ | | 105 | 10⁰ | 10² | 10² | 10¹ | 10² | 10¹ | 10³ | | 110 | 10⁰ | 10² | 10² | 10¹ | 10² | 10¹ | 10³ | | 115 | 10⁰ | 10³ | 10² | 10¹ | 10² | 10¹ | 10³ | | 120 | 10⁰ | 10³ | 10² | 10¹ | 10² | 10¹ | 10³ | --- ## Trend Verification - **Benign (Green)**: Dominates early flow lengths (0–5s) with a logarithmic decline. - **Infiltration (Yellow)**: Peaks at 0–5s, then declines sharply. - **Web-Attack (Gray)**: Minimal except for a spike at 85s. - **DoS (Red)**: Consistent but smaller than Benign. - **DDoS (Blue)**: Peaks at 30–40s. - **BoT (Orange)**: Peaks at 15–20s. - **BruteForce (Purple)**: Rare across all flow lengths. --- ## Final Notes - The chart uses a **logarithmic y-axis** to accommodate the wide range of frequencies. - No non-English text is present. - All legend colors are accurately mapped to their respective categories. - Data points are extracted based on visual segmentation of bars and cross-referenced with the legend. </details> (b) <details> <summary>2503.04404v2/x3.png Details</summary> ![7547b9504470d647c2c6dcb7a366faad09599a125f04eba3c6eb555c03c6199b](http://localhost:8000/v1/image/7547b9504470d647c2c6dcb7a366faad09599a125f04eba3c6eb555c03c6199b) ### Visual Description # Technical Document Analysis: Flow Length vs. Frequency Bar Chart ## Chart Overview This is a **stacked bar chart** visualizing the frequency distribution of network attack types across different flow lengths (in seconds). The y-axis uses a **logarithmic scale** (10⁰ to 10⁸), while the x-axis is linear (0–120 seconds). Each bar is segmented by color to represent distinct attack categories. --- ## Legend & Color Mapping The legend (top-right) defines 9 categories with unique colors: 1. **Ransomware** (Yellow) 2. **MITM** (Pink) 3. **Backdoor** (Light Blue) 4. **DDoS** (Red) 5. **Injection** (Cyan) 6. **Password** (Dark Green) 7. **XSS** (Teal) 8. **Scanning** (Bright Green) 9. **Benign** (Dark Green) **Note**: "Password" and "Benign" share the same dark green color but are differentiated by legend order. --- ## Axis Labels & Markers - **X-Axis**: "Flow Length (Seconds)" with ticks at 0, 5, 10, ..., 120. - **Y-Axis**: "Frequency" (log scale: 10⁰, 10², 10⁴, 10⁶, 10⁸). - **Legend Position**: Top-right corner, outside the plot area. --- ## Key Trends & Data Points ### 1. **Ransomware (Yellow)** - **Dominant at 0 seconds**: Tallest bar segment (≈10⁶ frequency). - **Declines sharply** after 0 seconds, becoming negligible by 10 seconds. ### 2. **Password (Dark Green)** - **Peaks at 30 seconds** (≈10⁵ frequency). - **Secondary peak at 115 seconds** (≈10⁴ frequency). ### 3. **Backdoor (Light Blue)** - **Peaks at 0 seconds** (≈10⁵ frequency). - **Secondary peak at 115 seconds** (≈10³ frequency). ### 4. **Benign (Dark Green)** - **Consistent presence** across all flow lengths (≈10²–10⁴ frequency). - **Highest at 115 seconds** (≈10⁵ frequency). ### 5. **MITM (Pink)** - **Peaks at 0 seconds** (≈10³ frequency). - **Secondary peak at 115 seconds** (≈10³ frequency). ### 6. **DDoS (Red)** - **Minor presence** at 0 seconds (≈10² frequency). - **No significant peaks** elsewhere. ### 7. **Injection (Cyan)** - **Minor presence** at 0 seconds (≈10² frequency). - **No significant peaks** elsewhere. ### 8. **XSS (Teal)** - **Minor presence** at 0 seconds (≈10² frequency). - **No significant peaks** elsewhere. ### 9. **Scanning (Bright Green)** - **Minor presence** at 0 seconds (≈10² frequency). - **No significant peaks** elsewhere. --- ## Spatial Grounding & Color Verification - **Legend Colors Match Bars**: All bar segments align with legend definitions (e.g., yellow = ransomware). - **Legend Position**: Top-right, outside the plot area, ensuring clarity. --- ## Component Isolation 1. **Header**: Legend with 9 categories. 2. **Main Chart**: Stacked bars with logarithmic y-axis and linear x-axis. 3. **Footer**: No additional text or data. --- ## Data Table Reconstruction | Flow Length (s) | Ransomware | MITM | Backdoor | DDoS | Injection | Password | XSS | Scanning | Benign | |------------------|------------|------|----------|------|-----------|----------|-----|----------|--------| | 0 | 10⁶ | 10³ | 10⁵ | 10² | 10² | 10⁴ | 10² | 10² | 10⁴ | | 30 | - | - | - | - | - | 10⁵ | - | - | 10⁴ | | 65 | - | - | - | - | - | 10⁴ | - | - | 10³ | | 115 | - | 10³ | 10³ | - | - | - | - | - | 10⁵ | **Note**: "-" indicates negligible or absent frequency. --- ## Conclusion The chart highlights **ransomware, password, and backdoor attacks** as the most frequent at specific flow lengths, while **benign traffic** dominates at longer durations. Other attack types (DDoS, injection, XSS, scanning) show minimal activity. </details> (c) <details> <summary>2503.04404v2/x4.png Details</summary> ![1203d383e6a2cbd113624af36a3e835282cbfd65fb87bf6c31fd29fb78254f35](http://localhost:8000/v1/image/1203d383e6a2cbd113624af36a3e835282cbfd65fb87bf6c31fd29fb78254f35) ### Visual Description # Technical Document Analysis: Network Flow Length Frequency Chart ## Chart Overview The image is a **stacked bar chart** with a **logarithmic y-axis** (Frequency) and a **linear x-axis** (Flow Length in Seconds). The chart visualizes the distribution of network flow lengths (measured in seconds) and their associated frequencies across nine distinct categories. --- ### Axis Labels and Markers - **X-Axis (Flow Length)**: - Label: "Flow Length (Seconds)" - Range: 0 to 120 seconds - Tick Interval: Every 5 seconds (0, 5, 10, ..., 120) - **Y-Axis (Frequency)**: - Label: "Frequency" - Scale: Logarithmic (10⁰ to 10⁸) - Tick Intervals: 10⁰, 10², 10⁴, 10⁶, 10⁸ --- ### Legend and Categories The legend (upper-left corner) defines nine categories, each represented by a unique color: 1. **Worms** (cyan) 2. **Analysis** (magenta) 3. **Shellcode** (yellow) 4. **Backdoor** (light blue) 5. **DoS** (red) 6. **Generic** (gray) 7. **Exploits** (brown) 8. **Fuzzers** (purple) 9. **Benign** (green) **Spatial Grounding**: The legend is positioned in the **upper-left quadrant** of the chart. --- ### Key Trends and Data Points 1. **General Trend**: - Frequency decreases exponentially with increasing flow length. - The tallest bar occurs at **x=0 seconds**, with a frequency exceeding 10⁶. - A secondary peak appears at **x=115 seconds**, with a frequency of ~10⁴. 2. **Category Distribution**: - **Benign (green)**: Dominates most bars, especially at shorter flow lengths (e.g., x=0, x=5). - **Worms (cyan)**: Significant presence at x=0 and x=115. - **Exploits (brown)**: Consistent but minor contributions across all flow lengths. - **DoS (red)**: Peaks at x=30 seconds (~10³ frequency). - **Fuzzers (purple)**: Notable at x=115 seconds. 3. **Notable Observations**: - At **x=0 seconds**, the bar is segmented into multiple categories, with **Benign** (green) being the largest contributor. - At **x=115 seconds**, the bar is dominated by **Worms** (cyan) and **Fuzzers** (purple). - **Shellcode (yellow)** and **Analysis (magenta)** are rare, appearing only in shorter flow lengths (<20 seconds). --- ### Component Isolation 1. **Header**: - Contains the legend and axis labels. 2. **Main Chart**: - Stacked bars represent cumulative frequency distributions. - Colors are strictly mapped to legend categories (verified via spatial grounding). 3. **Footer**: - No additional text or components. --- ### Data Table Reconstruction | Flow Length (s) | Worms (cyan) | Backdoor (light blue) | Generic (gray) | Exploits (brown) | DoS (red) | Fuzzers (purple) | Benign (green) | Shellcode (yellow) | Reconnaissance (pink) | |-----------------|--------------|------------------------|----------------|------------------|-----------|------------------|----------------|--------------------|-----------------------| | 0 | High | Moderate | Low | Low | Low | Low | Very High | Low | Moderate | | 5 | Moderate | Moderate | Low | Low | Low | Low | High | Low | Low | | 10 | Low | Low | Low | Low | Low | Low | High | Low | Low | | 15 | Low | Low | Low | Low | Low | Low | High | Low | Low | | 20 | Low | Low | Low | Low | Low | Low | High | Low | Low | | 25 | Low | Low | Low | Low | Low | Low | High | Low | Low | | 30 | Low | Low | Low | Low | Moderate | Low | High | Low | Low | | 35 | Low | Low | Low | Low | Moderate | Low | High | Low | Low | | 40 | Low | Low | Low | Low | Moderate | Low | High | Low | Low | | 45 | Low | Low | Low | Low | Moderate | Low | High | Low | Low | | 50 | Low | Low | Low | Low | Moderate | Low | High | Low | Low | | 55 | Low | Low | Low | Low | Moderate | Low | High | Low | Low | | 60 | Low | Low | Low | Low | Moderate | Low | High | Low | Low | | 65 | Low | Low | Low | Low | Moderate | Low | High | Low | Low | | 70 | Low | Low | Low | Low | Moderate | Low | High | Low | Low | | 75 | Low | Low | Low | Low | Moderate | Low | High | Low | Low | | 80 | Low | Low | Low | Low | Moderate | Low | High | Low | Low | | 85 | Low | Low | Low | Low | Moderate | Low | High | Low | Low | | 90 | Low | Low | Low | Low | Moderate | Low | High | Low | Low | | 95 | Low | Low | Low | Low | Moderate | Low | High | Low | Low | | 100 | Low | Low | Low | Low | Moderate | Low | High | Low | Low | | 105 | Low | Low | Low | Low | Moderate | Low | High | Low | Low | | 110 | Low | Low | Low | Low | Moderate | Low | High | Low | Low | | 115 | High | Low | Low | Low | Low | High | Moderate | Low | Low | | 120 | Low | Low | Low | Low | Low | Low | High | Low | Low | --- ### Trend Verification - **Benign (green)**: Peaks at x=0 and x=115, with a gradual decline in between. - **Worms (cyan)**: Sharp peak at x=0 and x=115; minimal presence elsewhere. - **DoS (red)**: Single peak at x=30 seconds. - **Exploits (brown)**: Consistent low-frequency contributions across all flow lengths. --- ### Final Notes - The chart uses a **logarithmic scale** for frequency to accommodate the wide range of values (10⁰ to 10⁸). - No embedded text or non-English content is present. - All legend colors match the corresponding bar segments (verified via spatial grounding). </details> (d) Figure 2: Flow length distribution in NF3-Datasets. The x-axis represents the length of flows in milliseconds, while the y-axis represents the frequency of a length, i.e., the number of flows with the same flow length. 5.1 Flow Length Distribution The analysis of flow length distribution (FLD) across various datasets provides critical insights into the behaviour of network traffic under both benign and malicious conditions. This subsection visualises and discusses FLD for our NetFlow datasets. In Figure 2, each plot presents the frequency of flow lengths, aggregated into predefined bins (50 bins), across all the classes of traffic. However, the nProbe tool, by default, is configured to export flow data in intervals not exceeding two minutes. This is a standard configuration that allows for efficient flow data collection without overwhelming the system with excessive data [51]. The 2-minutes interval is chosen to provide a reasonable level of detail while minimizing system resource consumption. <details> <summary>2503.04404v2/x5.png Details</summary> ![197d24c1b9b22fbb887d7b1a98479c9f4967e8ab158d023def8e34ee39bb677e](http://localhost:8000/v1/image/197d24c1b9b22fbb887d7b1a98479c9f4967e8ab158d023def8e34ee39bb677e) ### Visual Description # Technical Document Analysis of Bar Chart ## Axis Labels and Scale - **X-Axis**: "Time in seconds" (linear scale, 0 to 60 seconds) - **Y-Axis**: "Frequency" (logarithmic scale, 10⁰ to 10⁷) ## Legend - **Location**: Top-right corner of the chart - **Categories and Colors**: - **Theft**: Pink - **Reconnaissance**: Purple - **DDoS**: Blue - **DoS**: Red - **Benign**: Green ## Key Trends and Data Points 1. **DDoS (Blue)**: - **Trend**: Dominates early time intervals (0–20 seconds), with frequencies consistently above 10⁵. Gradual decline observed after 20 seconds. - **Peak**: Highest frequency (~10⁶) at 0–5 seconds. - **Notable**: Remains the largest contributor across most time intervals. 2. **DoS (Red)**: - **Trend**: Minimal presence before 20 seconds. Sharp increase post-20 seconds, peaking at ~10⁵ between 40–50 seconds. - **Peak**: Highest frequency (~10⁵) at 45–50 seconds. - **Notable**: Overtakes DDoS in frequency after 30 seconds. 3. **Theft (Pink)**: - **Trend**: Sporadic small spikes (10⁰–10³) primarily between 0–15 seconds. - **Peak**: Max frequency (~10²) at 10 seconds. - **Notable**: Rarely exceeds 10³. 4. **Reconnaissance (Purple)**: - **Trend**: Similar to Theft, with minor spikes (10⁰–10³) between 0–15 seconds. - **Peak**: Max frequency (~10²) at 5 seconds. - **Notable**: Disappears entirely after 20 seconds. 5. **Benign (Green)**: - **Trend**: Moderate presence (10¹–10⁴) across all intervals, with occasional spikes. - **Peak**: Max frequency (~10⁴) at 5 seconds. - **Notable**: Persistent but never dominant. ## Spatial Grounding and Validation - **Legend Colors Match Data**: - Blue (DDoS) aligns with tallest bars in early intervals. - Red (DoS) corresponds to later tall bars (40–50 seconds). - Pink/Purple (Theft/Reconnaissance) match small segments in early bars. - Green (Benign) appears as mid-sized segments throughout. ## Component Isolation - **Main Chart**: Bar segments stacked vertically, with colors representing categories. - **Legend**: Independent of chart, positioned for clarity. - **No Footer/Additional Text**: Chart focuses solely on frequency vs. time. ## Conclusion The chart illustrates a time-series frequency distribution of network events. DDoS and DoS dominate, with DDoS peaking early and DoS later. Theft and Reconnaissance are rare, while Benign activity is persistent but secondary. </details> (a) <details> <summary>2503.04404v2/x6.png Details</summary> ![ee4aa0d197bd696926e0f66d45649b8abc8a896ce51bf2661ae1ea59a63be2ac](http://localhost:8000/v1/image/ee4aa0d197bd696926e0f66d45649b8abc8a896ce51bf2661ae1ea59a63be2ac) ### Visual Description # Technical Document Extraction: Bar Chart Analysis ## Axis Titles and Labels - **Y-Axis**: Labeled "Frequency" with logarithmic scale ranging from 10⁰ to 10⁷. - **X-Axis**: Labeled "Time in seconds" with linear scale from 0 to 60. ## Legend - **Location**: Top-left corner of the chart. - **Categories and Colors**: - Web-Attack: Gray - Infiltration: Yellow - BoT: Orange - DoS: Red - BruteForce: Purple - Benign: Green - DDoS: Blue ## Chart Structure - **Bars**: Stacked vertically along the X-axis (time in seconds). - **Color Coding**: Each bar segment corresponds to a category from the legend. ## Key Trends and Data Points 1. **Web-Attack (Gray)**: - **Trend**: Dominates early time intervals (0–10 seconds), decreasing sharply after 10 seconds. - **Peak Frequency**: ~10⁵ at 0 seconds, dropping to ~10² by 20 seconds. 2. **Infiltration (Yellow)**: - **Trend**: High frequency in early intervals (0–15 seconds), declines steadily after 20 seconds. - **Peak Frequency**: ~10⁵ at 0 seconds, reducing to ~10³ by 30 seconds. 3. **BoT (Orange)**: - **Trend**: Minimal presence throughout; sporadic small segments. - **Frequency**: Consistently below 10². 4. **DoS (Red)**: - **Trend**: Moderate frequency in early intervals (0–15 seconds), disappears after 20 seconds. - **Peak Frequency**: ~10³ at 5 seconds. 5. **BruteForce (Purple)**: - **Trend**: Rare occurrences; only visible in early intervals (0–10 seconds). - **Frequency**: Peaks at ~10² at 0 seconds. 6. **Benign (Green)**: - **Trend**: Gradual increase in frequency over time, becoming dominant after 20 seconds. - **Peak Frequency**: ~10⁶ at 55 seconds. 7. **DDoS (Blue)**: - **Trend**: Moderate frequency in early intervals (0–15 seconds), declines after 20 seconds. - **Peak Frequency**: ~10⁴ at 10 seconds. ## Spatial Grounding of Legend - **Legend Position**: Top-left corner (coordinates: [x=0, y=0] relative to chart boundaries). - **Color Verification**: All bar segments match legend colors (e.g., green = Benign, yellow = Infiltration). ## Component Isolation 1. **Header**: Legend and axis titles. 2. **Main Chart**: Stacked bars with time intervals (0–60 seconds) and frequency values. 3. **Footer**: No additional text or labels. ## Data Table Reconstruction | Time (s) | Web-Attack | Infiltration | BoT | DoS | BruteForce | Benign | DDoS | |----------|------------|--------------|-----|-----|------------|--------|------| | 0 | 10⁵ | 10⁵ | 0 | 10³ | 10² | 0 | 10⁴ | | 5 | 10⁴ | 10⁵ | 0 | 10³ | 0 | 0 | 10⁴ | | 10 | 10³ | 10⁴ | 0 | 10³ | 0 | 0 | 10⁴ | | 15 | 10² | 10³ | 0 | 10² | 0 | 10⁴ | 10³ | | 20 | 0 | 10² | 0 | 0 | 0 | 10⁴ | 0 | | 25 | 0 | 10² | 0 | 0 | 0 | 10⁴ | 0 | | 30 | 0 | 10¹ | 0 | 0 | 0 | 10⁴ | 0 | | 35 | 0 | 0 | 0 | 0 | 0 | 10⁴ | 0 | | 40 | 0 | 0 | 0 | 0 | 0 | 10³ | 0 | | 45 | 0 | 10¹ | 0 | 0 | 0 | 10⁴ | 0 | | 50 | 0 | 0 | 0 | 0 | 0 | 10³ | 0 | | 55 | 0 | 0 | 0 | 0 | 0 | 10⁴ | 0 | | 60 | 0 | 10¹ | 0 | 0 | 0 | 10³ | 0 | ## Notes - **Language**: All text is in English. - **Critical Observations**: - Benign traffic dominates later intervals, suggesting a shift in network behavior over time. - Malicious categories (Web-Attack, Infiltration, DoS, DDoS) decline sharply after 20 seconds. - BruteForce and BoT show negligible activity throughout. </details> (b) <details> <summary>2503.04404v2/x7.png Details</summary> ![df5a79d2686e3991ec783fc9ceea3f790d3379778e904b56e211b8211bafad29](http://localhost:8000/v1/image/df5a79d2686e3991ec783fc9ceea3f790d3379778e904b56e211b8211bafad29) ### Visual Description # Technical Analysis of Attack Frequency Over Time ## Chart Description The image is a **stacked bar chart** visualizing the frequency of different cyber attack types over time. The y-axis represents **Frequency** on a logarithmic scale (10⁰ to 10⁷), while the x-axis represents **Time in seconds** (0 to 60). Each bar is segmented by color to represent distinct attack categories. --- ## Legend & Color Mapping The legend is located in the **top-right corner** of the chart. Colors and their corresponding attack types are: - **Orange**: Ransomware - **Pink**: MITM (Man-in-the-Middle) - **Dark Green**: Password - **Blue**: Backdoor - **Cyan**: Injection - **Light Green**: Scanning - **Red**: DOS - **Teal**: XSS - **Dark Green (darker shade)**: DDoS - **Light Green (lighter shade)**: Benign **Spatial Grounding**: - Legend occupies the top-right quadrant, aligned with the x-axis. - Colors in the legend match the segmented bars in the chart exactly. --- ## Key Trends & Data Points ### 1. **Ransomware (Orange)** - **Trend**: Dominates early time intervals (0–5 seconds), with frequency peaking at ~10⁵. - **Decline**: Drops sharply after 5 seconds, becoming negligible after 15 seconds. ### 2. **Password Attacks (Dark Green)** - **Trend**: Persistent across all time intervals, with frequency ranging from ~10² to ~10⁴. - **Peaks**: Notable spikes at 10, 15, and 25 seconds (~10³–10⁴). ### 3. **MITM (Pink)** - **Trend**: Moderate frequency in early intervals (0–10 seconds), peaking at ~10². - **Decline**: Disappears after 15 seconds. ### 4. **Backdoor (Blue)** - **Trend**: Low but consistent presence (~10¹–10²) in early intervals (0–5 seconds). - **Decline**: Fades after 5 seconds. ### 5. **Injection (Cyan)** - **Trend**: Sporadic occurrences, primarily between 5–15 seconds (~10¹–10²). - **Decline**: Absent after 20 seconds. ### 6. **Scanning (Light Green)** - **Trend**: Moderate frequency (~10²–10³) in early intervals (0–10 seconds). - **Decline**: Reduces after 15 seconds but reappears at 35–40 seconds (~10¹). ### 7. **DOS (Red)** - **Trend**: Minimal presence, with isolated bars at 5, 10, and 20 seconds (~10¹). ### 8. **XSS (Teal)** - **Trend**: Rare occurrences, concentrated at 15, 25, and 35 seconds (~10¹). ### 9. **DDoS (Dark Green, darker shade)** - **Trend**: Sporadic spikes at 5, 10, and 20 seconds (~10²–10³). ### 10. **Benign (Light Green, lighter shade)** - **Trend**: Distributed across all intervals but with lower frequency (~10¹–10²). - **Peaks**: Notable at 55 seconds (~10³). --- ## Observations - **Logarithmic Scale Impact**: The y-axis compression emphasizes relative differences in frequency. For example, a bar reaching 10⁵ is 100x larger than one at 10³. - **Dominant Attack Types**: Ransomware and Password attacks dominate early intervals, while Benign activity persists longer. - **Temporal Decay**: Most attack types (e.g., Ransomware, MITM) decay rapidly after 15 seconds, suggesting short-lived or burst-like behavior. --- ## Data Table Reconstruction | Time (s) | Ransomware | MITM | Password | Backdoor | Injection | Scanning | DOS | XSS | DDoS | Benign | |----------|------------|------|----------|----------|-----------|----------|-----|-----|------|--------| | 0 | 10³ | 10² | 10⁴ | 10¹ | 10¹ | 10² | 10¹ | 10¹ | 10¹ | 10¹ | | 5 | 10⁴ | 10² | 10⁵ | 10¹ | 10¹ | 10³ | 10¹ | 10¹ | 10² | 10² | | 10 | 10³ | 10² | 10⁴ | 10¹ | 10¹ | 10³ | 10¹ | 10¹ | 10² | 10² | | 15 | 10² | 10¹ | 10⁴ | 10¹ | 10¹ | 10² | 10¹ | 10¹ | 10² | 10² | | 20 | 10¹ | 10¹ | 10³ | 10¹ | 10¹ | 10² | 10¹ | 10¹ | 10² | 10² | | 25 | 10¹ | 10¹ | 10³ | 10¹ | 10¹ | 10² | 10¹ | 10¹ | 10² | 10² | | 30 | 10¹ | 10¹ | 10² | 10¹ | 10¹ | 10¹ | 10¹ | 10¹ | 10² | 10² | | 35 | 10¹ | 10¹ | 10² | 10¹ | 10¹ | 10² | 10¹ | 10¹ | 10² | 10² | | 40 | 10¹ | 10¹ | 10² | 10¹ | 10¹ | 10² | 10¹ | 10¹ | 10² | 10² | | 45 | 10¹ | 10¹ | 10² | 10¹ | 10¹ | 10² | 10¹ | 10¹ | 10² | 10² | | 50 | 10¹ | 10¹ | 10² | 10¹ | 10¹ | 10² | 10¹ | 10¹ | 10² | 10² | | 55 | 10¹ | 10¹ | 10³ | 10¹ | 10¹ | 10² | 10¹ | 10¹ | 10² | 10³ | --- ## Conclusion The chart reveals a **temporal decay** in most attack frequencies, with **Ransomware** and **Password** attacks being the most impactful in early intervals. **Benign activity** shows unexpected persistence, peaking at 55 seconds. The logarithmic scale highlights the disparity in attack magnitudes, emphasizing the need for targeted mitigation strategies. </details> (c) <details> <summary>2503.04404v2/x8.png Details</summary> ![340cdb60d00143786e4b938bde54205bdb588df2700d533a1248e9c0ad44020c](http://localhost:8000/v1/image/340cdb60d00143786e4b938bde54205bdb588df2700d533a1248e9c0ad44020c) ### Visual Description # Technical Document Analysis of Chart ## 1. Labels and Axis Titles - **Y-Axis**: "Frequency" (logarithmic scale: 10⁰ to 10⁷) - **X-Axis**: "Time in seconds" (linear scale: 0 to 60) - **Legend**: Located in the top-right corner, with 10 categories mapped to colors. ## 2. Legend Categories and Colors | Category | Color | Spatial Grounding (Legend) | |----------------|-----------|----------------------------| | Worms | Cyan | Top-left of legend | | Analysis | Magenta | Top-left of legend | | Shellcode | Yellow | Top-left of legend | | Backdoor | Light Blue| Top-middle of legend | | DoS | Red | Top-middle of legend | | Generic | Gray | Top-right of legend | | Exploits | Brown | Top-right of legend | | Fuzzers | Purple | Top-right of legend | | Reconnaissance | Pink | Top-right of legend | | Benign | Green | Top-right of legend | ## 3. Key Trends and Data Points ### General Observations - **Dominant Category**: "Benign" (green) dominates early time intervals (0–15 seconds), with frequencies exceeding 10⁵. - **Declining Frequencies**: All categories show a sharp decline in frequency after 15 seconds, with most near-zero by 30 seconds. - **Secondary Peaks**: - "Shellcode" (yellow) peaks at ~10 seconds (~10³ frequency). - "Worms" (cyan) peaks at ~5 seconds (~10² frequency). - "Analysis" (magenta) and "Reconnaissance" (pink) show sporadic low-frequency activity. ### Category-Specific Trends 1. **Benign (Green)** - **Trend**: Tallest bars in 0–15 seconds, declining exponentially. - **Data Points**: - 0s: ~10⁶ - 5s: ~10⁵ - 10s: ~10⁵ - 15s: ~10⁴ - 20s+: Near-zero. 2. **Shellcode (Yellow)** - **Trend**: Single peak at 10s (~10³), then negligible. - **Data Points**: - 10s: ~10³ - Other intervals: <10¹. 3. **Worms (Cyan)** - **Trend**: Early peak at 5s (~10²), then declines. - **Data Points**: - 5s: ~10² - 10s: ~10¹ - 15s+: <10⁰. 4. **Analysis (Magenta)** - **Trend**: Low-frequency activity spread across 0–20s. - **Data Points**: - 0s: ~10¹ - 5s: ~10¹ - 10s: ~10¹ - 15s: ~10⁰. 5. **Backdoor (Light Blue)** - **Trend**: Minimal presence, concentrated at 5s (~10¹). - **Data Points**: - 5s: ~10¹ - Other intervals: <10⁰. 6. **DoS (Red)** - **Trend**: Low-frequency activity at 5s (~10¹). - **Data Points**: - 5s: ~10¹ - Other intervals: <10⁰. 7. **Generic (Gray)** - **Trend**: Very low-frequency activity at 5s (~10⁰). - **Data Points**: - 5s: ~10⁰ - Other intervals: <10⁰. 8. **Exploits (Brown)** - **Trend**: Single low-frequency bar at 5s (~10⁰). - **Data Points**: - 5s: ~10⁰ - Other intervals: <10⁰. 9. **Fuzzers (Purple)** - **Trend**: Low-frequency activity at 10s (~10¹). - **Data Points**: - 10s: ~10¹ - Other intervals: <10⁰. 10. **Reconnaissance (Pink)** - **Trend**: Low-frequency activity at 15s (~10¹). - **Data Points**: - 15s: ~10¹ - Other intervals: <10⁰. ## 4. Spatial Grounding Confirmation - **Legend Colors Match Bars**: All legend colors (e.g., green for "Benign") align with corresponding bars in the chart. - **No Mismatches**: Verified across all 10 categories. ## 5. Component Isolation - **Main Chart**: Bar chart with stacked bars (no overlapping categories). - **Legend**: Independent of chart, positioned in top-right. - **No Additional Components**: No headers, footers, or sidebars present. ## 6. Final Notes - **Logarithmic Scale Impact**: The y-axis compression emphasizes early-time dominance of "Benign" activity. - **Data Completeness**: All textual information extracted; no missing labels or axis markers. - **Language**: All text in English; no non-English content detected. </details> (d) Figure 3: Average distribution for Inter-Packet arrival time from source to destination. <details> <summary>2503.04404v2/x9.png Details</summary> ![1ecc5b2020fd027795bb1c6c7a479bc3e9ad08b02a7dabe067847ea8885aa333](http://localhost:8000/v1/image/1ecc5b2020fd027795bb1c6c7a479bc3e9ad08b02a7dabe067847ea8885aa333) ### Visual Description # Technical Document Analysis of Bar Chart ## 1. Axis Labels and Scale - **Y-Axis**: - Label: "Frequency" - Scale: Logarithmic (10⁰ to 10⁷) - Tick Marks: 10⁰, 10¹, 10², 10³, 10⁴, 10⁵, 10⁶, 10⁷ - **X-Axis**: - Label: "Time in seconds" - Range: 0 to 60 seconds - Tick Marks: 0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60 ## 2. Legend - **Location**: Top-left corner - **Categories**: - **Theft**: Pink - **Reconnaissance**: Purple - **DDoS**: Blue - **DoS**: Red - **Benign**: Green ## 3. Key Trends and Data Points ### A. DoS (Red) - **Trend**: - Sharp increase around 30–35 seconds, peaking at ~10⁵–10⁶ frequency. - Gradual decline after 35 seconds, stabilizing at ~10³–10⁴ by 55–60 seconds. - **Critical Points**: - 30s: ~10⁵ - 35s: ~10⁶ - 45s: ~10⁴ ### B. DDoS (Blue) - **Trend**: - Consistent presence across all intervals, with a secondary peak at 30–35 seconds (~10⁴–10⁵). - Drops to ~10²–10³ after 40 seconds. - **Critical Points**: - 30s: ~10⁵ - 40s: ~10³ - 60s: ~10⁴ ### C. Theft (Pink) - **Trend**: - Low baseline (~10¹–10²) until 5–10 seconds. - Peaks at ~10³ around 5–10 seconds, then declines. - **Critical Points**: - 5s: ~10³ - 15s: ~10² ### D. Reconnaissance (Purple) - **Trend**: - Moderate presence (~10²–10³) until 15 seconds. - Peaks at ~10³ around 15 seconds, then declines. - **Critical Points**: - 15s: ~10³ - 25s: ~10² ### E. Benign (Green) - **Trend**: - Dominates early intervals (0–10 seconds) with frequencies ~10²–10³. - Disappears after 15 seconds. - **Critical Points**: - 5s: ~10³ - 10s: ~10² ## 4. Spatial Grounding - **Legend Position**: Top-left corner (coordinates: [x=0, y=0] relative to chart boundaries). - **Color Consistency Check**: - All bars match legend colors (e.g., red bars = DoS, blue bars = DDoS). ## 5. Data Table Reconstruction | Time (s) | Theft (Pink) | Reconnaissance (Purple) | DDoS (Blue) | DoS (Red) | Benign (Green) | |----------|--------------|-------------------------|-------------|-----------|----------------| | 0 | 10¹ | 10² | 10³ | 10⁴ | 10³ | | 5 | 10³ | 10² | 10⁴ | 10⁵ | 10³ | | 10 | 10² | 10³ | 10⁴ | 10⁵ | 10² | | 15 | 10² | 10³ | 10⁴ | 10⁵ | 10¹ | | 20 | 10¹ | 10² | 10⁴ | 10⁵ | 10⁰ | | 25 | 10⁰ | 10¹ | 10⁴ | 10⁵ | 10⁰ | | 30 | 10⁰ | 10⁰ | 10⁵ | 10⁶ | 10⁰ | | 35 | 10⁰ | 10⁰ | 10⁴ | 10⁵ | 10⁰ | | 40 | 10⁰ | 10⁰ | 10³ | 10⁴ | 10⁰ | | 45 | 10⁰ | 10⁰ | 10² | 10³ | 10⁰ | | 50 | 10⁰ | 10⁰ | 10¹ | 10² | 10⁰ | | 55 | 10⁰ | 10⁰ | 10¹ | 10² | 10⁰ | | 60 | 10⁰ | 10⁰ | 10⁴ | 10³ | 10⁰ | ## 6. Observations - **Dominant Threat**: DoS (red) dominates mid-chart (30–35s) with frequencies exceeding 10⁵. - **Early Activity**: Benign (green) and Theft (pink) show early peaks but decline sharply. - **Sustained Activity**: DDoS (blue) maintains presence throughout, with a secondary peak at 30s. - **Logarithmic Scale Impact**: Frequencies vary by orders of magnitude, emphasizing scale disparities. ## 7. Missing Information - No explicit title or contextual metadata provided in the image. - Exact numerical values for frequencies are estimated based on logarithmic scale alignment. </details> (a) <details> <summary>2503.04404v2/x10.png Details</summary> ![fdd5f26b7c34ea870516051590e5910f566b16921ebec9bea92eb92e5392d7db](http://localhost:8000/v1/image/fdd5f26b7c34ea870516051590e5910f566b16921ebec9bea92eb92e5392d7db) ### Visual Description # Technical Document Analysis of Network Traffic Frequency Chart ## Chart Overview The image is a **stacked bar chart** visualizing the frequency distribution of network traffic types over time. The chart uses a **logarithmic y-axis** (frequency) and a **linear x-axis** (time in seconds). Key components include: --- ### **Axis Labels and Markers** - **X-axis**: "Time in seconds" with markers at intervals of 5 seconds (0, 5, 10, ..., 60). - **Y-axis**: "Frequency" with logarithmic scale markers: 10⁰, 10¹, 10², ..., 10⁷. --- ### **Legend and Categories** The legend (top-right corner) defines seven traffic types with color-coded segments: 1. **Web-Attack** (gray) 2. **Bot** (orange) 3. **BruteForce** (purple) 4. **Benign** (green) 5. **Infiltration** (yellow) 6. **DoS** (red) 7. **DDoS** (blue) --- ### **Data Structure** Each bar represents a 5-second interval (e.g., 0–5s, 5–10s, etc.). Bars are segmented vertically by traffic type, with heights proportional to frequency on the logarithmic scale. --- ### **Key Trends and Data Points** 1. **Benign Traffic (Green)**: - Dominates all intervals, consistently occupying the largest portion of bars. - Peaks at **~10⁶ frequency** around **20–25 seconds**. - Declines slightly after 30 seconds but remains the most frequent category. 2. **Web-Attack (Gray)**: - Only present in the **first 5 seconds** (0–5s interval). - Frequency: ~10² (100) at 0s, dropping to ~10¹ (10) by 5s. 3. **Bot Traffic (Orange)**: - Peaks at **~10⁵ frequency** around **10 seconds**. - Gradual decline after 15 seconds, with minimal presence after 30s. 4. **Infiltration (Yellow)**: - Highest frequency in **0–10s** (~10⁵). - Sharp decline after 15s, with sporadic low-level activity thereafter. 5. **DoS (Red)**: - Minimal until **35 seconds**, then spikes to **~10⁵ frequency**. - Declines sharply after 40s. 6. **BruteForce (Purple)** and **DDoS (Blue)**: - Both categories show **negligible frequency** (<10²) across all intervals. --- ### **Spatial Grounding** - **Legend Position**: Top-right corner, aligned with the chart's upper boundary. - **Color Consistency**: All segments match legend colors (e.g., green = Benign, red = DoS). --- ### **Trend Verification** - **Benign**: Steady high frequency, logarithmic scale emphasizes dominance. - **Web-Attack**: Sudden drop-off after 5s confirms short-lived attack. - **Bot/Infiltration**: Early peaks align with initial attack phases. - **DoS**: Late emergence (35s) suggests delayed or targeted attack. --- ### **Conclusion** The chart reveals a dynamic network traffic profile, with **Benign traffic** as the baseline and **DoS** as the most impactful late-stage anomaly. Early intervals show mixed attack activity (Web-Attack, Bot, Infiltration), while later intervals highlight DoS as the primary threat. *Note: No non-English text or embedded tables were present.* </details> (b) <details> <summary>2503.04404v2/x11.png Details</summary> ![3dbd1d44fda57cfa79a55f0719d33a62e971f4e3d6c6084110ac3cc659dd5f80](http://localhost:8000/v1/image/3dbd1d44fda57cfa79a55f0719d33a62e971f4e3d6c6084110ac3cc659dd5f80) ### Visual Description # Technical Document Extraction: Bar Chart Analysis ## 1. Labels and Axis Titles - **X-axis**: "Time in seconds" (linear scale, 0 to 60 seconds) - **Y-axis**: "Frequency" (logarithmic scale, 10⁰ to 10⁷) - **Legend**: Located in the top-right corner, with color-coded categories. ## 2. Legend Categories and Colors | Category | Color | Description | |----------------|-----------|---------------------------------| | ransomware | Orange | High frequency at time 0 | | mitm | Pink | High frequency at time 0 | | backdoor | Blue | High frequency at time 0 | | dos | Red | High frequency at time 0 | | injection | Cyan | High frequency at time 0 | | password | Green | Increases over time | | xss | Dark Green| High frequency at time 0 | | scanning | Light Green| High frequency at time 0 | | ddos | Dark Blue | High frequency at time 0 | | Benign | Dark Blue | High frequency at time 0 | ## 3. Chart Description - **Structure**: Stacked bar chart with vertical bars representing frequency at each time interval. - **Key Observations**: - **Time 0**: All categories (ransomware, mitm, backdoor, dos, injection, xss, scanning, ddos, Benign) show high frequencies (10³–10⁴ range). - **Time 5–15**: Frequencies for ransomware, mitm, and backdoor decline sharply. Password and xss show moderate declines. - **Time 20–35**: Password frequency increases significantly (peaking at ~10⁵ at time 35). Other categories remain low or absent. - **Time 40–60**: Only password and xss categories appear, with frequencies dropping below 10². ## 4. Trend Verification - **Ransomware (Orange)**: Peaks at time 0 (~10³), then declines to near-zero by time 10. - **Mitm (Pink)**: Peaks at time 0 (~10⁴), declines to ~10² by time 10, and disappears after time 15. - **Backdoor (Blue)**: Peaks at time 0 (~10⁴), declines to ~10² by time 10, and disappears after time 15. - **Dos (Red)**: Peaks at time 0 (~10³), declines to ~10² by time 10, and disappears after time 15. - **Injection (Cyan)**: Peaks at time 0 (~10³), declines to ~10² by time 10, and disappears after time 15. - **Password (Green)**: Starts low (~10² at time 0), increases steadily to ~10⁵ at time 35, then declines. - **Xss (Dark Green)**: Peaks at time 0 (~10³), declines to ~10² by time 10, and disappears after time 15. - **Scanning (Light Green)**: Peaks at time 0 (~10³), declines to ~10² by time 10, and disappears after time 15. - **Ddos (Dark Blue)**: Peaks at time 0 (~10³), declines to ~10² by time 10, and disappears after time 15. - **Benign (Dark Blue)**: Peaks at time 0 (~10³), declines to ~10² by time 10, and disappears after time 15. ## 5. Data Table Reconstruction | Time (s) | Ransomware | Mitm | Backdoor | Dos | Injection | Password | Xss | Scanning | Ddos | Benign | |----------|------------|--------|----------|-------|-----------|----------|---------|----------|--------|---------| | 0 | ~10³ | ~10⁴ | ~10⁴ | ~10³ | ~10³ | ~10² | ~10³ | ~10³ | ~10³ | ~10³ | | 5 | ~10³ | ~10³ | ~10³ | ~10³ | ~10³ | ~10² | ~10³ | ~10³ | ~10³ | ~10³ | | 10 | ~10² | ~10² | ~10² | ~10² | ~10² | ~10² | ~10² | ~10² | ~10² | ~10² | | 15 | ~10² | ~10² | ~10² | ~10² | ~10² | ~10² | ~10² | ~10² | ~10² | ~10² | | 20 | ~10¹ | ~10¹ | ~10¹ | ~10¹ | ~10¹ | ~10² | ~10² | ~10² | ~10¹ | ~10¹ | | 25 | ~10¹ | ~10¹ | ~10¹ | ~10¹ | ~10¹ | ~10² | ~10² | ~10² | ~10¹ | ~10¹ | | 30 | ~10¹ | ~10¹ | ~10¹ | ~10¹ | ~10¹ | ~10² | ~10² | ~10² | ~10¹ | ~10¹ | | 35 | ~10¹ | ~10¹ | ~10¹ | ~10¹ | ~10¹ | ~10⁵ | ~10² | ~10² | ~10¹ | ~10¹ | | 40 | ~10⁰ | ~10⁰ | ~10⁰ | ~10⁰ | ~10⁰ | ~10² | ~10² | ~10² | ~10⁰ | ~10⁰ | | 45 | ~10⁰ | ~10⁰ | ~10⁰ | ~10⁰ | ~10⁰ | ~10² | ~10² | ~10² | ~10⁰ | ~10⁰ | | 50 | ~10⁰ | ~10⁰ | ~10⁰ | ~10⁰ | ~10⁰ | ~10² | ~10² | ~10² | ~10⁰ | ~10⁰ | | 55 | ~10⁰ | ~10⁰ | ~10⁰ | ~10⁰ | ~10⁰ | ~10² | ~10² | ~10² | ~10⁰ | ~10⁰ | | 60 | ~10⁰ | ~10⁰ | ~10⁰ | ~10⁰ | ~10⁰ | ~10² | ~10² | ~10² | ~10⁰ | ~10⁰ | ## 6. Notes - **Logarithmic Y-axis**: Frequencies are represented on a logarithmic scale, making small values (e.g., 10⁰) appear compressed. - **No Exact Numerical Data**: The chart provides visual trends but does not include precise numerical values for frequencies. - **Color Consistency**: All legend colors match the corresponding bar segments in the chart. </details> (c) <details> <summary>2503.04404v2/x12.png Details</summary> ![1aa0ecd6a3f240d25aae8f8131e973e8ad33d197820fb5aa7e28cc4b50b29126](http://localhost:8000/v1/image/1aa0ecd6a3f240d25aae8f8131e973e8ad33d197820fb5aa7e28cc4b50b29126) ### Visual Description # Technical Document Extraction: Network Traffic Frequency Analysis ## 1. Labels and Axis Titles - **X-Axis**: "Time in seconds" (linear scale, 0–60 seconds, increments of 5) - **Y-Axis**: "Frequency" (logarithmic scale, 10⁰ to 10⁷) - **Legend**: Located in the top-right corner, color-coded categories: - **Worms**: Blue - **Backdoor**: Light blue - **Generic**: Gray - **Exploits**: Brown - **Analysis**: Pink - **DoS**: Red - **Fuzzers**: Purple - **Shellcode**: Yellow - **Reconnaissance**: Light pink - **Benign**: Green ## 2. Categories and Sub-Categories - **Primary Categories** (Legend labels): - Worms - Backdoor - Generic - Exploits - Analysis - DoS - Fuzzers - Shellcode - Reconnaissance - Benign ## 3. Chart Structure - **Main Chart**: Stacked bar chart with time (x-axis) vs. frequency (y-axis). - **Bars**: Each bar represents cumulative frequency of categories at specific time intervals (0–60 seconds). - **Stacking**: Colors within bars correspond to legend categories (e.g., green = Benign, blue = Worms). ## 4. Key Trends and Data Points ### Benign Traffic - **Dominant Category**: Tallest bars across all time intervals. - **Peak Frequency**: ~10⁶ at time = 0 seconds, decreasing to ~10³ by 60 seconds. - **Trend**: Gradual decline over time. ### Worms and Backdoor - **Secondary Categories**: Frequencies ~10⁴–10⁵ at time = 0, declining to ~10² by 60 seconds. - **Trend**: Steeper decline than Benign. ### Exploits, Analysis, DoS, Fuzzers, Shellcode, Reconnaissance - **Low Frequencies**: Mostly <10³ across all time intervals. - **Exceptions**: - **Exploits**: ~10² at time = 10 seconds. - **Reconnaissance**: ~10² at time = 5 seconds. - **Trend**: Minimal activity; sporadic peaks. ### Generic - **Moderate Frequencies**: ~10³–10⁴ at time = 0, declining to ~10¹ by 60 seconds. - **Trend**: Steady decline. ## 5. Data Table Reconstruction | Time (s) | Benign (10ⁿ) | Worms (10ⁿ) | Backdoor (10ⁿ) | Generic (10ⁿ) | Exploits (10ⁿ) | Analysis (10ⁿ) | DoS (10ⁿ) | Fuzzers (10ⁿ) | Shellcode (10ⁿ) | Reconnaissance (10ⁿ) | |----------|--------------|-------------|----------------|---------------|----------------|----------------|-----------|---------------|-----------------|----------------------| | 0 | ~10⁶ | ~10⁵ | ~10⁴ | ~10³ | ~10² | ~10¹ | ~10⁰ | ~10⁰ | ~10⁰ | ~10⁰ | | 5 | ~10⁴ | ~10³ | ~10² | ~10² | ~10¹ | ~10⁰ | ~10⁰ | ~10⁰ | ~10⁰ | ~10⁰ | | 10 | ~10³ | ~10² | ~10¹ | ~10¹ | ~10⁰ | ~10⁰ | ~10⁰ | ~10⁰ | ~10⁰ | ~10⁰ | | 15 | ~10² | ~10¹ | ~10⁰ | ~10⁰ | ~10⁰ | ~10⁰ | ~10⁰ | ~10⁰ | ~10⁰ | ~10⁰ | | 20 | ~10¹ | ~10⁰ | ~10⁰ | ~10⁰ | ~10⁰ | ~10⁰ | ~10⁰ | ~10⁰ | ~10⁰ | ~10⁰ | | 25–60 | ~10⁰ | ~10⁰ | ~10⁰ | ~10⁰ | ~10⁰ | ~10⁰ | ~10⁰ | ~10⁰ | ~10⁰ | ~10⁰ | *Note: Values are approximations based on bar heights and logarithmic scale.* ## 6. Spatial Grounding and Color Verification - **Legend Position**: Top-right corner (x: 0.85, y: 0.95 relative to chart bounds). - **Color Consistency**: - Green bars = Benign (confirmed across all time intervals). - Blue bars = Worms (confirmed at time = 0, 5, 10). - Light blue bars = Backdoor (confirmed at time = 0, 5). - Gray bars = Generic (confirmed at time = 0, 10, 15). - Brown bars = Exploits (confirmed at time = 10). - Pink bars = Analysis (confirmed at time = 0, 5). - Red bars = DoS (confirmed at time = 0). - Purple bars = Fuzzers (confirmed at time = 0). - Yellow bars = Shellcode (confirmed at time = 0). - Light pink bars = Reconnaissance (confirmed at time = 5). ## 7. Component Isolation - **Header**: Chart title (not explicitly labeled, inferred as "Network Traffic Frequency Analysis"). - **Main Chart**: Stacked bars with time-frequency relationship. - **Footer**: X-axis label ("Time in seconds") and tick marks. ## 8. Additional Observations - **Logarithmic Scale Impact**: Compresses high-frequency values (e.g., 10⁶ vs. 10³ appears as a 3-order magnitude difference). - **Sparse Activity**: Most categories show negligible activity after 20 seconds. - **Dominance of Benign Traffic**: Accounts for >90% of total frequency at time = 0. ## 9. Language Declaration - **Primary Language**: English (all labels, axis titles, and legend text are in English). - **No Secondary Languages Detected**. </details> (d) Figure 4: Average distribution for Inter-Packet arrival time from destination to source. In NF3-UNSW-NB15, benign flows predominantly appear in shorter-length bins, suggesting quick, routine communications typical in normal network operations. In contrast, attack flows such as Backdoor and Worms exhibit longer flow lengths, indicating sustained connections possibly used for data exfiltration or maintaining persistent threats within the network. Benign flows in NF3-BoT-IoT are consistently short, reflecting typical user-generated traffic. However, DDoS and DoS attacks show a broad distribution across all flow lengths, highlighting their disruptive nature, which is characterised by both short and burst-like flows and prolonged attack durations to exhaust network resources. In the NF3-CSE-CIC-IDS2018 dataset, the flow lengths of benign traffic are moderately spread, indicating a variety of normal operations. Attack types such as DDoS and Brute Force attacks show significant occurrences at mid-range flow lengths, suggesting these attacks involve sequences of interactions that may be a part of the attack strategy to probe or compromise the network. Lastly, FLD in NF3-ToN-IoT highlights notable distinctions between benign traffic and attack types such as MITM, Injection, and Password attacks. The majority of benign flows are short, which is consistent with normal operational traffic. Attack flows, particularly Password and MITM, demonstrate variability in their length distributions, reflecting the diverse tactics employed, from quick compromise attempts to more extended unauthorised access. Across all datasets, the benign flows commonly populate the shortest flow length bins, reflecting typical, efficient network communications. Attack flows, depending on their nature, either mimic benign profiles or exhibit extended lengths, indicative of malicious activities. Such patterns are crucial for developing effective security measures, as they allow for the characterization of traffic based on flow length, enhancing anomaly detection capabilities. 5.2 Inter-Packet Arrival time Analysing the histograms for the distribution of IAT provides valuable insights into how network behaviours are influenced by different types of network activities and attacks. Consistent IAT intervals typically indicate smooth traffic flow, while variability can reveal issues such as congestion or uneven data transmission. In this subsection, we specifically focus on the average IAT across the four NetFlow datasets. Figure 3 and 4 display the distributions of these averages, illustrating the timing dynamics across all communications between sources and destinations within each dataset. Figure 3 shows IAT distribution from source to destination across the four datasets and similarly, Figure 4 shows the opposite direction from destination to source. These plots highlight the variability in IAT across benign and malicious traffic, offering clues into network dynamics under various conditions. Each dataset reveals unique IAT patterns for different attack types. For example, the ToN-IoT dataset shows distinct peaks for more sophisticated attacks like MITM (Man-in-the-Middle) and Backdoor at specific IAT intervals, possibly reflecting the tactical nature of these attacks, which may involve periodic signalling or data exfiltration activities. Similarly, the UNSW-NB15 dataset demonstrates how diverse attack types like Worms, Shellcode, and Exploits are distributed across various IAT ranges, highlighting the varied timing strategies used in different exploits. In NF3-BoT-IoT, the benign traffic is characterised by shorter IATs, frequently occurring at lower millisecond ranges, which is indicative of regular, uninterrupted network flow. In contrast, malicious activities such as DOS and DDOS attacks show a wider distribution of average IAT values, with notable peaks at higher intervals, reflecting the irregular timing patterns typical of such attacks that disrupt normal network traffic patterns. Comparing these plots across datasets enriches our understanding of how different network environments or attack vectors can influence IAT distributions. It also underscores the importance of considering context and environment when analysing network traffic, as the same type of attack may exhibit different IAT characteristics in different datasets. 5.3 Number of Flows vs. Time When analysing traffic over time, it is important to track the distribution of attack classes within the relevant time intervals. This helps in understanding how many flows are labelled as benign or malicious, providing a clearer picture of the traffic behaviour. In this subsection, we represent the traffic as a time series for each attack class to pinpoint their exact occurrence times. Typically, most dataset was recorded over multiple days to simulate real-world conditions. As depicted in Figure 5, we chose one representative day from each dataset, aggregating the traffic data per minute and displaying the volume on a logarithmic scale to enhance the clarity of visual interpretation. <details> <summary>2503.04404v2/x13.png Details</summary> ![b5ad3d617544ebb64d081fd2faa79dc39bb82635bb5332af1242f7757f15cedc](http://localhost:8000/v1/image/b5ad3d617544ebb64d081fd2faa79dc39bb82635bb5332af1242f7757f15cedc) ### Visual Description # Technical Document Extraction: Line Chart Analysis ## 1. Labels and Axis Titles - **X-Axis**: "Time in minutes" (range: 0 to 800) - **Y-Axis**: "Number of Flows" (logarithmic scale: 10⁰ to 10⁶) - **Legend**: Located in the top-right corner, with three entries: - **Green**: Benign - **Red**: DoS - **Blue**: DDoS ## 2. Key Trends and Data Points ### Benign (Green Line) - **Trend**: Stable, low-level activity throughout the observed period. - **Data Points**: - Initial value: ~10¹ flows at t=0. - Minor fluctuations between 10¹ and 10² flows. - No significant spikes; remains below 10³ flows after t=100. ### DoS (Red Line) - **Trend**: Sharp, intermittent spikes followed by rapid declines. - **Data Points**: - First spike: ~10⁵ flows at t=0, lasting ~50 minutes. - Second spike: ~10⁴ flows at t=100, lasting ~20 minutes. - Final spike: ~10³ flows at t=200, lasting ~10 minutes. - Activity ceases after t=250. ### DDoS (Blue Line) - **Trend**: Sustained high activity with gradual decline. - **Data Points**: - Initial value: ~10⁵ flows at t=150. - Peaks at ~10⁶ flows between t=150 and t=200. - Gradual decline to ~10³ flows by t=250. - Activity ceases after t=250. ## 3. Spatial Grounding and Color Verification - **Legend Position**: Top-right corner (coordinates: [x=700, y=50] relative to chart boundaries). - **Color Consistency**: - Green line matches "Benign" legend entry. - Red line matches "DoS" legend entry. - Blue line matches "DDoS" legend entry. ## 4. Component Isolation ### Header - No header text or metadata present. ### Main Chart - **Axes**: Logarithmic y-axis and linear x-axis. - **Lines**: - **Green (Benign)**: Low, stable activity. - **Red (DoS)**: Intermittent, high-magnitude spikes. - **Blue (DDoS)**: Sustained high activity with gradual decay. ### Footer - No footer text or metadata present. ## 5. Observations - **DoS vs. DDoS**: DoS exhibits abrupt, short-lived spikes, while DDoS shows prolonged, high-volume activity. - **Benign Traffic**: Remains consistently low, unaffected by malicious traffic patterns. - **Temporal Decay**: All malicious traffic (DoS/DDoS) ceases after t=250 minutes. ## 6. Missing Elements - No data table or embedded text blocks present. - No secondary y-axis or annotations. ## 7. Conclusion The chart illustrates distinct traffic patterns: benign traffic remains stable, while DoS and DDoS exhibit contrasting attack behaviors. The logarithmic y-axis emphasizes the magnitude differences between traffic types. </details> (a) <details> <summary>2503.04404v2/x14.png Details</summary> ![4c3249ecd0ca789f1dd7ab5fe2192c103ea695c63c513a7d303c21a28ea25615](http://localhost:8000/v1/image/4c3249ecd0ca789f1dd7ab5fe2192c103ea695c63c513a7d303c21a28ea25615) ### Visual Description # Technical Document Extraction: Line Graph Analysis ## 1. Labels and Axis Titles - **X-Axis**: "Time in minutes" (ranging from 0 to 800, with markers at 0, 100, 200, 300, 400, 500, 600, 700, 800). - **Y-Axis**: "Number of Flows" (logarithmic scale, ranging from 10⁰ to 10⁶, with markers at 10⁰, 10¹, 10², 10³, 10⁴, 10⁵, 10⁶). ## 2. Legend - **Location**: Top-right corner of the graph. - **Labels**: - **Green**: "Benign" - **Red**: "DoS" ## 3. Key Trends and Data Points ### Benign (Green Line) - **Initial State**: Stable at ~10³ flows from 0 to ~300 minutes. - **Drop**: Sharp decline to near-zero (~10⁰) starting at ~500 minutes, with residual fluctuations until 800 minutes. ### DoS (Red Line) - **First Spike**: Sharp rise to ~10⁴ flows between 100–150 minutes, followed by an immediate drop. - **Second Spike**: Another rise to ~10⁴ flows at ~300 minutes, followed by a rapid decline. - **Post-Spike**: Remains at 10⁰ flows after 350 minutes. ## 4. Spatial Grounding - **Legend Placement**: Top-right corner (coordinates: [x: 0.8, y: 0.9] relative to the graph's bounding box). - **Line Colors**: - Green (Benign) matches the green line. - Red (DoS) matches the red line. ## 5. Trend Verification - **Benign**: - Visual trend: Flat with minor oscillations until 300 minutes, then a steep drop at 500 minutes. - Confirmed data points: ~10³ (0–300 min), ~10⁰ (500–800 min). - **DoS**: - Visual trend: Two narrow spikes (100–150 min and 300 min), otherwise zero. - Confirmed data points: ~10⁴ (100–150 min, 300 min), 10⁰ elsewhere. ## 6. Component Isolation - **Header**: Legend (top-right). - **Main Chart**: - X-axis (time) and Y-axis (flows) with logarithmic scaling. - Two data series (Benign and DoS) plotted as lines. - **Footer**: No additional text or components. ## 7. Missing Information - No embedded text, data tables, or heatmaps present. - No explicit units beyond "Time in minutes" and "Number of Flows". - No contextual explanation for the spikes or drops (e.g., cause of DoS events). ## 8. Final Notes - The graph uses a logarithmic Y-axis to visualize wide-ranging flow counts. - DoS events are transient and infrequent, while Benign flows dominate the baseline. </details> (b) <details> <summary>2503.04404v2/x15.png Details</summary> ![a9dfab26bdf35d5ad842a91d3529fd24faad82fb4219ab06fe2c0872928646b4](http://localhost:8000/v1/image/a9dfab26bdf35d5ad842a91d3529fd24faad82fb4219ab06fe2c0872928646b4) ### Visual Description # Technical Document Analysis of Line Chart ## 1. Axis Labels and Scale - **X-axis**: "Time in minutes" (linear scale, 0–800) - **Y-axis**: "Number of Flows" (logarithmic scale, 10⁰–10⁶) ## 2. Legend - **Location**: Top-right corner - **Labels and Colors**: - Green: "Benign" - Red: "ddos" - Blue: "ddos" (duplicate label, conflicting with red) - Cyan: "injection" ## 3. Line Series Analysis ### Red Line ("ddos") - **Trend**: Flat at ~10² until ~180 minutes, then sharp drop to ~10¹. - **Key Data Points**: - [0, 10²], [180, 10¹] ### Blue Line ("ddos") - **Trend**: Starts at ~10², rises to ~10³ around 200 minutes, fluctuates between 10³–10⁴, stabilizes post-400 minutes. - **Key Data Points**: - [0, 10²], [200, 10³], [400, 10³–10⁴] ### Cyan Line ("injection") - **Trend**: Starts at ~10², spikes to ~10⁴ around 200 minutes, fluctuates between 10³–10⁴ post-200. - **Key Data Points**: - [0, 10²], [200, 10⁴], [400, 10³–10⁴] ### Green Line ("Benign") - **Trend**: Starts at ~10², rises to ~10³ around 200 minutes, stabilizes post-200. - **Key Data Points**: - [0, 10²], [200, 10³], [400, 10³] ## 4. Spatial Grounding - **Legend Position**: [x: 850, y: 10] (top-right corner) - **Color Consistency Check**: - Red line matches "ddos" (first entry). - Blue line matches "ddos" (second entry, conflicting label). - Cyan line matches "injection". - Green line matches "Benign". ## 5. Trend Verification - **Red Line**: Horizontal until ~180 minutes, then vertical drop. - **Blue Line**: Gradual rise, sustained oscillations, stabilization. - **Cyan Line**: Sharp spike, sustained oscillations. - **Green Line**: Gradual rise, stabilization. ## 6. Component Isolation - **Header**: No explicit title; legend serves as key. - **Main Chart**: Four overlapping line series with logarithmic y-axis. - **Footer**: No additional text or annotations. ## 7. Data Table Reconstruction No explicit data table present. Trends inferred from line behavior. ## 8. Critical Observations - **Legend Conflict**: Duplicate "ddos" label with red and blue lines. Likely a data labeling error. - **Logarithmic Scale**: Y-axis values represent orders of magnitude (e.g., 10² = 100, 10³ = 1,000). - **Temporal Correlation**: All anomalies (spikes/drops) align at ~200 minutes, suggesting a shared trigger event. ## 9. Language Declaration - **Primary Language**: English - **Translated Text**: None (no non-English text detected). </details> (c) <details> <summary>2503.04404v2/x16.png Details</summary> ![c53d30ad622b8f78cda0842c63e1df887c418b4c23202644e9f6f510317e42a4](http://localhost:8000/v1/image/c53d30ad622b8f78cda0842c63e1df887c418b4c23202644e9f6f510317e42a4) ### Visual Description # Technical Document Extraction: Line Chart Analysis ## 1. Labels and Axis Titles - **Y-Axis**: "Number of Flows" (logarithmic scale, 10⁰ to 10⁶) - **X-Axis**: "Time in minutes" (0 to 800) - **Legend**: Located in the top-right corner, with color-coded categories. ## 2. Legend Categories and Colors | Category | Color | |------------------|-----------| | Analysis | Pink | | Fuzzers | Purple | | Backdoor | Blue | | Generic | Brown | | Benign | Green | | Reconnaissance | Red | | DoS | Yellow | | Shellcode | Light Blue| | Exploits | Dark Red | | Worms | Cyan | ## 3. Key Trends and Data Points ### 3.1 Benign (Green Line) - **Trend**: Dominates the chart with a stable baseline of ~10³ to 10⁴ flows. - **Spikes**: Periodic sharp drops to ~10² flows, recurring every ~50–100 minutes. - **Peaks**: Sustained high values (10⁴–10⁵) between spikes. ### 3.2 Other Categories (Analysis, Fuzzers, etc.) - **Trend**: All lines (pink, purple, blue, brown, red, yellow, light blue, dark red, cyan) fluctuate between ~10¹ and 10³ flows. - **Variability**: No consistent patterns; erratic spikes and drops. - **Overlap**: Multiple categories often intersect, with no single line dominating. ## 4. Spatial Grounding - **Legend Position**: Top-right corner (coordinates: [x=700, y=100] relative to chart boundaries). - **Color Consistency**: All lines match their legend labels (e.g., green = Benign, pink = Analysis). ## 5. Trend Verification - **Benign Line**: Slopes upward slightly over time, with periodic dips. - **Other Lines**: No clear upward/downward trends; chaotic fluctuations. ## 6. Component Isolation - **Header**: Legend (top-right). - **Main Chart**: Line graph with logarithmic y-axis and linear x-axis. - **Footer**: No additional components. ## 7. Data Table Reconstruction No explicit data table present. Trends inferred from visual patterns. ## 8. Language and Text - **Primary Language**: English. - **Transcribed Text**: All labels, axis titles, and legend entries are in English. ## 9. Critical Observations - **Dominance of Benign Flows**: Green line consistently exceeds other categories by orders of magnitude. - **Anomalies**: Sharp drops in Benign flows suggest potential misclassification or external events. - **Low-Volume Categories**: Analysis, Fuzzers, and Reconnaissance show minimal activity compared to Benign. ## 10. Conclusion The chart illustrates a high-volume, stable "Benign" flow category with intermittent anomalies, while other categories exhibit low, erratic activity. No explicit numerical data points are provided; trends are derived from visual inspection. </details> (d) Figure 5: Temporal Distribution of Network Traffic Across Four Datasets. This figure illustrates the minute-by-minute network traffic flow for NF3-Datasets on representative days, showcasing the onset, duration, and termination of various attack classes alongside benign traffic. Starting with day 1 of NF3-UNSW-NB15, all attack classes occur concurrently throughout the day, providing a complex overlay of multiple threats, which is characteristic of sophisticated real-world attack scenarios. This simultaneous occurrence requires further analysis techniques to isolate and identify individual attack vectors. Another observation from NF3-BoT-IoT day 1 is the clear periods of intense DDoS and DoS attacks, with sharp increases in flow counts, followed by periods of lower activity. This pattern suggests the attacks were launched in waves, a common tactic in denial-of-service attacks to overwhelm systems periodically. On the fifth day of the NF3-CSE-CIC-IDS2018 dataset, the distribution reveals a dominant presence of benign traffic, with intermittent spikes in DoS attack flows. The attack patterns appear as short-lived bursts rather than continuous flooding, suggesting controlled execution, possibly mimicking real-world attack scenarios or stress-testing conditions.. Lastly, NF3-ToN-IoT on day 5 displays separate and distinct instances of DDoS, DoS, and Injection attacks along with periods of benign activity. Throughout the day, benign traffic remains consistent and predominantly at a lower flow level, which is typical of a synthetic dataset designed to maintain a baseline for comparison. This distribution suggests that while attacks are not related or overlapping, the dataset effectively captures distinct and varied attack dynamics within the same day, allowing for the analysis of each threat type under controlled conditions. While the analysis presented focuses on a single representative day for each dataset, similar examinations were conducted across all active days within each dataset. This comprehensive analysis is crucial for developing a robust understanding of the variability and consistency of network attack behaviours over extended periods. The results underscore the diversity in attack methodologies and their temporal characteristics, which can vary not just from day to day but also from one dataset to another. After representing the whole period of each dataset, we found that most attack classes were implemented separately on different days. However, an exception is observed in the NF3-UNSW-NB15 dataset, where all attacks were injected simultaneously. While having multiple attacks simultaneously can occur in real-life scenarios, it is recommended for researchers to analyse each class individually to better understand its pattern. Table 4 catalogues, in detail, the number of active days for each dataset along with the specific attacks implemented on those days. This tabulation aids in quantifying the extent and variety of network attacks captured in the datasets, providing a foundational reference for further analysis or model training. <details> <summary>2503.04404v2/x17.png Details</summary> ![2c1ed99d2517234bbee4bcdfeeb1d08ea91fef6e01d86041f1b6951785b368b6](http://localhost:8000/v1/image/2c1ed99d2517234bbee4bcdfeeb1d08ea91fef6e01d86041f1b6951785b368b6) ### Visual Description # Technical Document Analysis of Line Chart ## Chart Overview The image depicts a **line chart** with a **logarithmic y-axis** and a **linear x-axis**. The chart visualizes volume data over time, with four distinct data series represented by colored lines. No textual annotations or legends are explicitly visible in the image, but color-coding is used to differentiate data series. --- ## Axes and Labels - **X-Axis (Horizontal):** - Label: **"Time in Minutes"** - Scale: Linear, ranging from **0 to 800** in increments of **100**. - Ticks: Marked at **0, 100, 200, 300, 400, 500, 600, 700, 800**. - **Y-Axis (Vertical):** - Label: **"Volume"** - Scale: Logarithmic, ranging from **10⁰ to 10¹⁰** in increments of **10²** (e.g., 10⁰, 10², 10⁴, ..., 10¹⁰). - Ticks: Marked at **10⁰, 10², 10⁴, 10⁶, 10⁸, 10¹⁰**. - **Gridlines:** - Light gray gridlines span the entire chart, aligning with axis ticks. --- ## Data Series and Trends Four colored lines represent distinct data series. Below is a breakdown of their visual trends and approximate values: ### 1. **Green Line** - **Legend Position:** Top-right corner (color matches legend). - **Trend:** - Dominates the chart with the **highest peaks**, reaching up to **~10⁸** (100 million). - Exhibits **sharp, intermittent spikes** between **0–400 minutes**, followed by a drop to **zero** after 400 minutes. - Post-400 minutes, the line remains at **zero**. ### 2. **Blue Line** - **Legend Position:** Top-right corner (color matches legend). - **Trend:** - Second-highest volume, peaking at **~10⁷** (10 million). - Shows **moderate volatility** with smaller spikes compared to the green line. - Drops to **zero** after 400 minutes. ### 3. **Orange Line** - **Legend Position:** Top-right corner (color matches legend). - **Trend:** - Third-highest volume, peaking at **~10⁶** (1 million). - Exhibits **smaller, less frequent spikes** than the green and blue lines. - Drops to **zero** after 400 minutes. ### 4. **Pink Line** - **Legend Position:** Top-right corner (color matches legend). - **Trend:** - Lowest volume, peaking at **~10⁴** (10,000). - Shows **minimal activity** compared to other lines, with sparse spikes. - Drops to **zero** after 400 minutes. --- ## Key Observations 1. **Volatility Patterns:** - All lines exhibit activity **only between 0–400 minutes**, with no data beyond this point. - The green line is the most volatile, followed by blue, orange, and pink. 2. **Volume Hierarchy:** - Green > Blue > Orange > Pink (in terms of peak volume). 3. **Logarithmic Scale Impact:** - The y-axis compresses large values, making differences between 10⁶ and 10⁸ appear less pronounced than they are. 4. **Legend Placement:** - The legend is located in the **top-right corner** of the chart, but **no labels** are visible in the image. --- ## Missing Information - **Legend Labels:** The chart includes a legend in the top-right corner, but the textual labels for the colors (e.g., "Series A," "Series B") are not visible in the image. - **Numerical Data Points:** Exact values for peaks and troughs are not provided; only approximate magnitudes (e.g., 10⁸) are inferred from the logarithmic scale. --- ## Conclusion The chart illustrates **time-dependent volume trends** for four data series, with activity concentrated in the first 400 minutes. The logarithmic y-axis emphasizes relative differences in volume magnitudes, while the linear x-axis provides a clear temporal framework. Without explicit legend labels or numerical annotations, further interpretation requires additional context. </details> (a) <details> <summary>2503.04404v2/x18.png Details</summary> ![fb20b1fc9a3e023d88f42b283ceeae7ebf72cf6aff2b1664d308d5bb1c6a4528](http://localhost:8000/v1/image/fb20b1fc9a3e023d88f42b283ceeae7ebf72cf6aff2b1664d308d5bb1c6a4528) ### Visual Description # Technical Document Extraction: Line Graph Analysis ## Chart Overview - **Type**: Line graph with logarithmic y-axis - **Axes**: - **X-axis**: "Time in Minutes" (0–800) - **Y-axis**: "Volume" (logarithmic scale: 10⁰–10¹⁰) - **Legend**: Located on the right side of the chart - **Blue**: High Volume - **Green**: Medium Volume - **Pink**: Low Volume ## Key Trends and Data Points ### Line Series Analysis 1. **Blue Line (High Volume)**: - **Initial Value**: ~10⁸ at time 0 - **Trend**: Stable oscillations between 10⁷–10⁸ until ~500 minutes - **Critical Drop**: Sharp decline from 10⁸ to 10⁶ between 500–600 minutes - **Final Value**: 0 at 600 minutes 2. **Green Line (Medium Volume)**: - **Initial Value**: ~10⁶ at time 0 - **Trend**: Stable oscillations between 10⁵–10⁶ until ~500 minutes - **Critical Drop**: Sharp decline from 10⁶ to 10³ between 500–600 minutes - **Final Value**: 0 at 600 minutes 3. **Pink Line (Low Volume)**: - **Initial Value**: ~10⁴ at time 0 - **Trend**: Stable oscillations between 10³–10⁴ until ~500 minutes - **Critical Drop**: Sharp decline from 10⁴ to 10² between 500–600 minutes - **Final Value**: 0 at 600 minutes ### Cross-Series Observations - All three lines exhibit identical temporal behavior: - **Stability Phase**: 0–500 minutes (minor fluctuations within respective magnitude bands) - **Collapsed Phase**: 500–600 minutes (simultaneous exponential drop to zero) - **Magnitude Relationship**: Blue > Green > Pink across all time intervals ## Spatial Grounding - **Legend Position**: Right-aligned, adjacent to the chart - **Color Consistency**: - Blue (High Volume) consistently above green (Medium Volume) - Green consistently above pink (Low Volume) ## Component Isolation 1. **Header**: Chart title absent; axes dominate top/bottom 2. **Main Chart**: - Three distinct line series with logarithmic scaling - Gridlines at logarithmic intervals (10⁰, 10², 10⁴, 10⁶, 10⁸, 10¹⁰) 3. **Footer**: No additional annotations or metadata ## Data Reconstruction | Time (Minutes) | High Volume (Blue) | Medium Volume (Green) | Low Volume (Pink) | |----------------|--------------------|-----------------------|-------------------| | 0 | 10⁸ | 10⁶ | 10⁴ | | 500 | 10⁸ | 10⁶ | 10⁴ | | 600 | 0 | 0 | 0 | ## Critical Notes - **Logarithmic Scale Implications**: - Y-axis intervals represent orders of magnitude (e.g., 10⁴ to 10⁶ = 100× increase) - Visual drops appear linear but represent exponential decay - **Temporal Trigger**: All volume collapses occur precisely between 500–600 minutes - **No Intermediate Data**: No values exist between 600–800 minutes on any axis ## Language Declaration - **Primary Language**: English (all labels, axis titles, and legend text) - **No Secondary Languages Detected** </details> (b) <details> <summary>2503.04404v2/x19.png Details</summary> ![d1aa15036acf918e7402e01e64639b7bf448c06ab976eaff7227e2a3d2134f20](http://localhost:8000/v1/image/d1aa15036acf918e7402e01e64639b7bf448c06ab976eaff7227e2a3d2134f20) ### Visual Description # Technical Document Extraction: Line Graph Analysis ## Axis Labels and Titles - **Y-Axis**: Labeled "Volume" with a logarithmic scale ranging from $10^0$ to $10^{10}$. - **X-Axis**: Labeled "Time in Minutes" with a linear scale from 0 to 800. ## Legend - **Location**: Right side of the graph. - **Labels and Colors**: - **Green**: Line A - **Blue**: Line B - **Orange**: Line C - **Pink**: Line D ## Line Trends and Data Points ### Line A (Green) - **Trend**: - Flat at $10^6$ from 0 to ~350 minutes. - Sharp drop to $10^4$ at ~350 minutes. - Remains at $10^4$ until 800 minutes. - **Key Data Points**: - $ (0, 10^6) $ - $ (350, 10^4) $ - $ (800, 10^4) $ ### Line B (Blue) - **Trend**: - Starts at $10^6$ from 0 to ~350 minutes. - Sharp drop to $10^4$ at ~350 minutes. - Fluctuates between $10^6$ and $10^8$ from ~350 to 600 minutes. - Sharp drop to $10^4$ at ~600 minutes. - Remains at $10^4$ until 800 minutes. - **Key Data Points**: - $ (0, 10^6) $ - $ (350, 10^4) $ - $ (400, 10^6) $ - $ (500, 10^8) $ - $ (600, 10^4) $ - $ (800, 10^4) $ ### Line C (Orange) - **Trend**: - Flat at $10^4$ from 0 to ~350 minutes. - Sharp drop to $10^2$ at ~350 minutes. - Remains at $10^2$ until 800 minutes. - **Key Data Points**: - $ (0, 10^4) $ - $ (350, 10^2) $ - $ (800, 10^2) $ ### Line D (Pink) - **Trend**: - Flat at $10^4$ from 0 to ~350 minutes. - Sharp drop to $10^2$ at ~350 minutes. - Fluctuates between $10^4$ and $10^6$ from ~350 to 600 minutes. - Sharp drop to $10^2$ at ~600 minutes. - Remains at $10^2$ until 800 minutes. - **Key Data Points**: - $ (0, 10^4) $ - $ (350, 10^2) $ - $ (400, 10^4) $ - $ (500, 10^6) $ - $ (600, 10^2) $ - $ (800, 10^2) $ ## Spatial Grounding - **Legend Position**: Right-aligned, adjacent to the graph. - **Color Consistency**: All lines match their legend labels (e.g., green = Line A). ## Trend Verification - **Logarithmic Scale**: All y-axis values are powers of 10, confirming exponential scaling. - **Drop Points**: All lines exhibit abrupt drops at ~350 and ~600 minutes, aligning with the graph's structure. ## Component Isolation - **Header**: No explicit header text. - **Main Chart**: Dominates the image, with four distinct lines. - **Footer**: No explicit footer text. ## Additional Notes - **No Embedded Text**: No textual annotations or labels within the graph itself. - **Language**: All text is in English. This analysis ensures all textual and numerical data is extracted, trends are verified, and spatial relationships are confirmed for reproducibility. </details> (c) <details> <summary>2503.04404v2/x20.png Details</summary> ![68e490e196957b575ed7a63ce1d718743e84d3d1b68cba64d0491387d60e569b](http://localhost:8000/v1/image/68e490e196957b575ed7a63ce1d718743e84d3d1b68cba64d0491387d60e569b) ### Visual Description # Technical Document Extraction: Line Graph Analysis ## Chart Type - **Line Graph** with logarithmic y-axis and linear x-axis. ## Axis Labels - **X-axis**: "Time in Minutes" (linear scale, 0 to 800). - **Y-axis**: "Volume" (logarithmic scale, 10⁰ to 10¹⁰). ## Legend - **Placement**: Right side of the chart. - **Labels**: - **Blue**: Data Series A - **Green**: Data Series B - **Pink**: Data Series C - **Orange**: Data Series D ## Data Series Analysis ### Data Series A (Blue) - **Trend**: Highest magnitude, fluctuating between **10⁸** and **10⁹**. - **Pattern**: Periodic spikes with consistent amplitude (~10⁸ baseline, peaks ~10⁹). - **Key Observations**: - Spikes occur at regular intervals (~every 50 minutes). - No significant decay over time. ### Data Series B (Green) - **Trend**: Second-highest magnitude, fluctuating between **10⁶** and **10⁷**. - **Pattern**: Periodic spikes with smaller amplitude than Series A (~10⁶ baseline, peaks ~10⁷). - **Key Observations**: - Spikes align with Series A but at lower magnitude. - Slightly more erratic baseline compared to Series A. ### Data Series C (Pink) - **Trend**: Moderate magnitude, fluctuating between **10⁴** and **10⁵**. - **Pattern**: Periodic spikes with minimal amplitude (~10⁴ baseline, peaks ~10⁵). - **Key Observations**: - Spikes less frequent than Series A/B (~every 100 minutes). - Baseline remains stable. ### Data Series D (Orange) - **Trend**: Lowest magnitude, fluctuating between **10²** and **10³**. - **Pattern**: Periodic spikes with negligible amplitude (~10² baseline, peaks ~10³). - **Key Observations**: - Spikes least frequent (~every 200 minutes). - Baseline nearly flat. ## Key Observations 1. **Magnitude Hierarchy**: - Series A > Series B > Series C > Series D. 2. **Temporal Correlation**: - Spikes in Series A and B are temporally aligned but differ in magnitude. - Series C and D exhibit delayed or dampened responses relative to A/B. 3. **Logarithmic Scale Implications**: - Y-axis compression emphasizes relative differences in magnitude. - Small absolute changes in lower series (C/D) appear visually insignificant. ## Spatial Grounding - **Legend Position**: Right-aligned, adjacent to the chart. - **Color Consistency**: - Blue (A) ≠ Green (B) ≠ Pink (C) ≠ Orange (D). - No overlapping colors or mislabeling detected. ## Trend Verification - **Series A**: Upward spikes dominate; no downward trend. - **Series B**: Similar spike pattern to A but lower amplitude. - **Series C/D**: Subtle oscillations; no sustained growth/decay. ## Conclusion The graph depicts four distinct data series with hierarchical magnitudes and periodic behavior. Higher-series data (A/B) exhibit synchronized but scaled fluctuations, while lower-series data (C/D) show dampened or delayed responses. The logarithmic y-axis highlights exponential differences in volume across series. </details> (d) <details> <summary>2503.04404v2/x21.png Details</summary> ![50c1b948680faf2dd442eba9aa80a7297c3e7802d86d23f19de152449f948b07](http://localhost:8000/v1/image/50c1b948680faf2dd442eba9aa80a7297c3e7802d86d23f19de152449f948b07) ### Visual Description Icon/Small Image (718x32) </details> Figure 6: Time series representation of numerical fields in NF3-Datasets: IB, OB, IP, and OP. The x-axis represents time aggregated in minutes, while the y-axis shows the volume of each feature, illustrating fluctuations and patterns in network traffic over time. Table 4: Attacks Implemented on Active Days for Each Dataset | 1 2 3 | All All Benign-Only | BruteForce DoS DoS | Benign-Only Benign-Only Benign-Only | Reconnaissance Reconnaissance Reconnaissance | | --- | --- | --- | --- | --- | | 4 | — | DDoS | Scanning | DoS, DDoS | | 5 | — | DDoS | DoS, Scanning | Theft | | 6 | — | Web-Attack | DDoS, Injection, DoS | Theft | | 7 | — | Web-Attack | DDoS, Password | — | | 8 | — | Benign-Only | XSS, Password | — | | 9 | — | Infiltration | Backdoor, Ransomware | — | | 10 | — | Infiltration | MITM, Backdoor | — | | 11 | — | BoT | — | — | 5.4 Timeseries Representation of Netflow Features Monitoring network traffic volume over time is essential for understanding network behaviour and identifying trends or irregularities that may not be apparent in static analysis. By analysing traffic as a time series, we can detect variations in network load, identify peak usage time intervals, and observe patterns of data flow across different time intervals. This continuous observation allows for a deeper understanding of normal traffic behaviour and helps to highlight anomalies or unusual patterns that could indicate underlying issues. In this subsection, we represent different numerical and categorical features from the datasets as time series to gain insights into the temporal dynamics of the traffic. This visualisation not only helps in understanding how these features distribute over time but also showcases the enhanced analysis capabilities introduced by adding temporal information into this version of the datasets. 5.4.1 Numerical Fields In this analysis, we focus on four pivotal numerical features: IN_BYTES (IB), IN_PKTS (IP), OUT_BYTES (OB), and OUT_PKTS (OP). These features are instrumental in gauging the volume and flow of data moving into and out of the network, critical for deciphering overall traffic patterns [38, 39, 40]. IB and OB measure the amount of data received and sent, respectively, offering insights into data load, bandwidth usage, and potential congestion points. Simultaneously, IP and OP count the number of packets transmitted, which is essential for assessing the efficiency of packet transmission, pinpointing any packet loss, and evaluating the balance of traffic flow. To enable a thorough monitoring of network traffic over time, we aggregate these features by minute. This temporal granularity unveils detailed patterns and fluctuations in traffic that illuminate the network’s performance and utilisation. For consistent and focused analysis, we have chosen the same single-day snapshots as in the previous section, as shown in Figure 6. The analysis of these time series reveals a symmetrical pattern between IB and OB, as well as between IP and OP, indicative of a balanced communication pattern within the network where the volume of incoming bytes and packets closely mirrors that of outgoing bytes and packets over time. This symmetry reflects a stable network environment where data inflow and outflow are consistent, suggesting effective network management and robust infrastructure. Specific observations from the representative days across various datasets illustrate the nuanced dynamics of network traffic: NF3-ToN-IoT and NF3-CSE-CIC-IDS2018, both on Day 5, show consistent levels of IB and OB with sporadic spikes possibly linked to operational anomalies or specific events. In contrast, NF3-UNSW-NB15 Day 1 features a notable early spike in OB, suggesting an event like data exfiltration or a substantial data transfer, is potentially benign. Meanwhile, NF3-BoT-IoT Day 1 exhibits significant variability in OP, indicative of intermittent network attacks or disruptions, underscoring the susceptibility to external threats. 5.4.2 Categorical Fields Categorical features, such as Origin/Destination IPs and Ports, offer valuable insights into the structure and behaviour of network traffic. By tracking the number of unique IPs and ports over time, we can better understand communication patterns, identifying which devices are actively engaged in the network. This also reveals the diversity of traffic whether it’s distributed across many endpoints or concentrated on specific services. Additionally, monitoring these features helps detect unusual behaviour such as sudden increases in unique IPs or port activity which could indicate irregular network events [33]. NIDS datasets often vary significantly in the number of unique IP addresses and ports they capture, reflecting differences in the scope and diversity of network traffic. The number of unique IPs and ports present in each of the proposed datasets is shown in Table 5. Table 5: Count of unique categorical fields in NF3-Datasets | NF3-UNSW-NB15 NF3-CSECIC-IDS2018 NF3-ToN-IoT | 40 183,806 15,396 | 40 29,226 9,011 | 64,620 65,325 65,536 | 64,631 63,353 65,536 | | --- | --- | --- | --- | --- | | NF3-BoT-IoT | 20 | 291 | 65,536 | 65,536 | Similar to the previous subsection, Figure 7 visualises four categorical features: unique source and destination IP addresses and ports, captured in the same one-day snapshots. The x-axis represents time in minutes, while the y-axis shows the count of unique categorical values without repetition within each minute. Although the count is aggregated per minute, the data can be further zoomed in to monitor traffic at the level of seconds or even finer granularity. Here, we emphasise the utility of tracking categorical features over time, as it can assist in detecting certain types of anomalies related to source and destination IPs and ports. <details> <summary>2503.04404v2/x22.png Details</summary> ![8337eb699be9ff7bc718a69a0d45d25af4f2852f9b6074551491e582ce7616f7](http://localhost:8000/v1/image/8337eb699be9ff7bc718a69a0d45d25af4f2852f9b6074551491e582ce7616f7) ### Visual Description # Technical Document Analysis of Line Graph ## Axis Labels and Scale - **Y-Axis**: "Number of unique items" (logarithmic scale: 10⁰ to 10⁵) - **X-Axis**: "Time in Minutes" (linear scale: 0 to 800) ## Legend - **Location**: Right side of the graph - **Color-Series Mapping**: - **Blue**: Highest magnitude data series - **Pink**: Moderate magnitude data series - **Orange**: Low-magnitude data series - **Green**: Very low-magnitude data series - **Purple**: Lowest-magnitude data series ## Data Trends ### Blue Line (Highest Magnitude) - **Trend**: Dominant peaks at 10⁴–10⁵ range - **Key Intervals**: - 0–100 minutes: Sustained high activity - 250–350 minutes: Secondary peak cluster - **Post-400 Minutes**: Activity collapses to near-zero ### Pink Line (Moderate Magnitude) - **Trend**: Peaks at 10²–10³ range - **Key Intervals**: - 0–100 minutes: Initial activity - 300–400 minutes: Secondary activity - **Post-400 Minutes**: Gradual decline to baseline ### Orange Line (Low Magnitude) - **Trend**: Peaks at 10¹–10² range - **Key Intervals**: - 0–100 minutes: Initial activity - 300–400 minutes: Secondary activity - **Post-400 Minutes**: Sharp drop to baseline ### Green Line (Very Low Magnitude) - **Trend**: Peaks at 10⁰–10¹ range - **Key Intervals**: - 0–100 minutes: Initial activity - 300–400 minutes: Secondary activity - **Post-400 Minutes**: Near-zero values ### Purple Line (Lowest Magnitude) - **Trend**: Peaks at 10⁰ range - **Key Intervals**: - 0–100 minutes: Initial activity - 300–400 minutes: Secondary activity - **Post-400 Minutes**: Collapses to baseline ## Spatial Grounding - **Legend Position**: Right-aligned, adjacent to the graph - **Color Consistency Check**: - Blue = Highest peaks (confirmed) - Pink = Moderate peaks (confirmed) - Orange/Green/Purple = Lower magnitudes (confirmed) ## Summary The graph depicts five distinct data series with logarithmic magnitude differences. All series exhibit cyclical activity patterns between 0–400 minutes, with activity decaying to near-zero after 400 minutes. The blue series dominates in magnitude, while purple represents the smallest values. No textual annotations or embedded data tables are present. </details> (a) <details> <summary>2503.04404v2/x23.png Details</summary> ![df157d8de2044c18f9f94f4ffa1f7ac722e9ef7a9218b9b69e8c0f1dddb25bea](http://localhost:8000/v1/image/df157d8de2044c18f9f94f4ffa1f7ac722e9ef7a9218b9b69e8c0f1dddb25bea) ### Visual Description # Technical Document Analysis of Line Graph ## Chart Overview The image depicts a line graph with a logarithmic y-axis and linear x-axis. The graph tracks the number of unique items over time in minutes, with four distinct data series represented by colored lines. ## Axis Labels and Scale - **Y-Axis**: "Number of unique items" (logarithmic scale: 10⁰ to 10⁵) - **X-Axis**: "Time in Minutes" (linear scale: 0 to 800) ## Legend - **Location**: Bottom-right corner - **Color-Label Mapping**: - Blue: High-frequency items - Orange: Medium-frequency items - Green: Low-frequency items - Pink: Very low-frequency items ## Data Series Analysis ### 1. High-frequency items (Blue) - **Trend**: - Stable baseline ~10³ (1000 items) from 0–300 minutes - Sharp peak to ~10⁴ (10,000 items) at ~300 minutes - Gradual decline to ~10³ after 500 minutes - Abrupt drop to 0 after 600 minutes - **Key Data Points**: - [0, 10³], [300, 10⁴], [500, 10³], [600, 0] ### 2. Medium-frequency items (Orange) - **Trend**: - Stable ~10² (100 items) from 0–500 minutes - Sharp decline to 0 after 500 minutes - **Key Data Points**: - [0, 10²], [500, 0] ### 3. Low-frequency items (Green) - **Trend**: - Stable ~10² (100 items) from 0–500 minutes - Gradual decline to ~10¹ (10 items) after 500 minutes - Abrupt drop to 0 after 600 minutes - **Key Data Points**: - [0, 10²], [500, 10¹], [600, 0] ### 4. Very low-frequency items (Pink) - **Trend**: - Stable ~10¹ (10 items) from 0–500 minutes - Sharp decline to 0 after 500 minutes - **Key Data Points**: - [0, 10¹], [500, 0] ## Critical Observations 1. All series maintain stability until ~500 minutes 2. Synchronized collapse occurs after 500 minutes: - High-frequency items drop 90% (10⁴ → 10³) - Medium/low-frequency items drop 100% (10² → 0) - Very low-frequency items drop 100% (10¹ → 0) 3. Blue line exhibits unique pre-collapse volatility (notable peak at 300 minutes) ## Spatial Grounding Verification - Legend colors match line colors exactly: - Blue (High-frequency) = Topmost line - Orange (Medium) = Second line - Green (Low) = Third line - Pink (Very low) = Bottom line ## Language Note No non-English text detected in the image. All labels and axis titles are in English. </details> (b) <details> <summary>2503.04404v2/x24.png Details</summary> ![61ffba74df19ff1d9132310a9e617c083022c39d5bb84c2facf136ffb3787058](http://localhost:8000/v1/image/61ffba74df19ff1d9132310a9e617c083022c39d5bb84c2facf136ffb3787058) ### Visual Description # Technical Document Extraction: Line Graph Analysis ## Axis Labels and Titles - **Y-Axis**: "Number of unique items" (logarithmic scale, ranging from 10⁰ to 10⁵) - **X-Axis**: "Time in Minutes" (linear scale, ranging from 0 to 800 minutes) ## Legend - **Placement**: Right side of the graph - **Color-Coded Labels**: - **Blue**: Line A (highest data series) - **Pink**: Line B (second-highest) - **Orange**: Line C (third-highest) - **Green**: Line D (lowest data series) ## Data Series Trends ### Line A (Blue) - **Initial Trend**: Stable at ~1000 units (10²) from 0–150 minutes. - **Key Drop**: Sharp decline to ~100 units (10¹) at ~150 minutes. - **Post-Drop Behavior**: Fluctuates between ~100 and ~1000 units (10¹–10³) with irregular spikes. - **Final Trend**: Stabilizes near ~1000 units (10³) after ~600 minutes. ### Line B (Pink) - **Initial Trend**: Stable at ~100 units (10²) from 0–150 minutes. - **Key Drop**: Sharp decline to ~10 units (10¹) at ~150 minutes. - **Post-Drop Behavior**: Fluctuates between ~10 and ~100 units (10¹–10²) with moderate variability. - **Final Trend**: Stabilizes near ~100 units (10²) after ~600 minutes. ### Line C (Orange) - **Initial Trend**: Stable at ~10 units (10¹) from 0–150 minutes. - **Key Rise**: Sharp increase to ~100 units (10²) at ~150 minutes. - **Post-Rise Behavior**: Fluctuates between ~10 and ~100 units (10¹–10²) with moderate variability. - **Final Trend**: Stabilizes near ~100 units (10²) after ~600 minutes. ### Line D (Green) - **Initial Trend**: Stable at ~10 units (10¹) from 0–150 minutes. - **Key Drop**: Sharp decline to ~1 unit (10⁰) at ~150 minutes. - **Post-Drop Behavior**: Fluctuates between ~1 and ~10 units (10⁰–10¹) with minor variability. - **Final Trend**: Stabilizes near ~10 units (10¹) after ~600 minutes. ## Key Observations 1. **Synchronized Event at ~150 Minutes**: All lines exhibit a sharp change (drop or rise) at ~150 minutes, suggesting a systemic event or intervention. 2. **Hierarchical Behavior**: - Line A (blue) consistently maintains the highest values. - Line B (pink) and Line C (orange) show inverse responses to the ~150-minute event (B drops, C rises). - Line D (green) exhibits the most extreme drop, suggesting a critical threshold or failure. 3. **Logarithmic Scale Implications**: The y-axis's logarithmic nature emphasizes relative changes (e.g., a 10x increase from 10 to 100 units is visually equivalent to a 10x increase from 100 to 1000 units). ## Spatial Grounding - **Legend Position**: Right-aligned, adjacent to the graph. - **Color Consistency**: - Blue (Line A) matches the highest data series. - Pink (Line B) aligns with the second-highest. - Orange (Line C) corresponds to the third-highest. - Green (Line D) matches the lowest data series. ## Trend Verification - **Line A**: Slopes downward sharply at ~150 minutes, then oscillates with decreasing amplitude. - **Line B**: Slopes downward sharply at ~150 minutes, then oscillates with moderate amplitude. - **Line C**: Slopes upward sharply at ~150 minutes, then oscillates with moderate amplitude. - **Line D**: Slopes downward sharply at ~150 minutes, then oscillates with minimal amplitude. ## Conclusion The graph depicts four distinct data series with synchronized behavioral shifts at ~150 minutes. The logarithmic y-axis highlights proportional changes, while the linear x-axis provides temporal context. The legend confirms color-to-series mapping, and spatial grounding ensures clarity in data interpretation. </details> (c) <details> <summary>2503.04404v2/x25.png Details</summary> ![f98d07fe10bfb9b91c4e0010f245e6c121fe928e4826cdc0dc92e554de8f9aca](http://localhost:8000/v1/image/f98d07fe10bfb9b91c4e0010f245e6c121fe928e4826cdc0dc92e554de8f9aca) ### Visual Description # Technical Document Extraction: Line Graph Analysis ## 1. Axis Labels and Titles - **Y-Axis**: Labeled "Number of unique items" with a logarithmic scale ranging from $10^0$ to $10^5$. - **X-Axis**: Labeled "Time in Minutes" with a linear scale from 0 to 800. ## 2. Legend and Data Series - **Legend**: Located on the right side of the graph. Colors correspond to data series as follows: - **Blue**: Highest data series (range: ~1,000–10,000 unique items). - **Pink**: Second-highest (range: ~100–1,000 unique items). - **Orange**: Third (range: ~10–100 unique items). - **Green**: Lowest (range: ~1–10 unique items). ## 3. Data Trends - **Blue Line**: - **Trend**: Stable with minor fluctuations. Peaks consistently near $10^3$ (1,000) unique items. - **Key Points**: - Initial spike at $x=0$ (~1,000 items). - Sustained oscillations between ~1,000–10,000 items. - **Pink Line**: - **Trend**: Moderate fluctuations. Peaks near $10^2$ (100) unique items. - **Key Points**: - Initial value ~100 items. - Oscillates between ~100–1,000 items. - **Orange Line**: - **Trend**: Stable with minor variations. Peaks near $10^1$ (10) unique items. - **Key Points**: - Initial value ~10 items. - Remains between ~10–100 items. - **Green Line**: - **Trend**: Minimal fluctuations. Peaks near $10^0$ (1) unique item. - **Key Points**: - Initial value ~1 item. - Remains between ~1–10 items. ## 4. Graph Components - **Grid**: Light gray grid lines for reference. - **Logarithmic Scale**: Y-axis uses powers of 10 (1, 10, 100, 1,000, 10,000, 100,000). - **No Additional Text**: No embedded text, tables, or secondary legends. ## 5. Spatial Grounding - **Legend Position**: Right-aligned, adjacent to the graph. - **Color Consistency**: - Blue line matches blue legend marker. - Pink line matches pink legend marker. - Orange line matches orange legend marker. - Green line matches green legend marker. ## 6. Observations - All data series exhibit periodic oscillations, suggesting cyclical behavior. - Higher data series (blue, pink) show larger amplitude fluctuations compared to lower series (orange, green). - No data points fall below $10^0$ (1) or exceed $10^5$ (100,000) on the y-axis. ## 7. Conclusion The graph depicts four distinct data series with logarithmic scaling on the y-axis. Each series demonstrates unique amplitude and stability characteristics, with no overlapping trends or anomalies. </details> (d) <details> <summary>2503.04404v2/x21.png Details</summary> ![50c1b948680faf2dd442eba9aa80a7297c3e7802d86d23f19de152449f948b07](http://localhost:8000/v1/image/50c1b948680faf2dd442eba9aa80a7297c3e7802d86d23f19de152449f948b07) ### Visual Description Icon/Small Image (718x32) </details> Figure 7: Representation of categorical features in NF3-Datasets: IPV4_SRC_ADDR, IPV4_DST_ADDR, IPV4_SRC_PORT, and IPV4_DST_PORT. The x-axis represents time aggregated in minutes, and the y-axis shows the count of unique values for each category, highlighting the diversity in network activities over time. In the NF3-CSE-CIC-IDS2018 Day 5, the count of unique source IPs (IPV4_SRC_ADDR) remains relatively steady, suggesting consistent activity from a stable set of source IPs throughout the day. Minor fluctuations in destination IPs (IPv4_DST_ADDR) may indicate interactions with a variety of external services or hosts. The source ports (L4_SRC_PORT) display stability with an occasional sharp spike, potentially pointing to a brief period of heightened network activity or an anomaly, while destination ports (L4_DST_PORT) show similar stability, suggesting regular communication patterns without significant anomalies. For NF3-ToN-IoT Day 5, both source and destination IPs exhibit peaks, notably in destination IPs, which could signify interactions with various external systems, potentially indicative of external data exchanges or scanning activities. Periodic spikes in both source and destination ports may indicate batched communications or network scans, suggesting an environment where network interactions are both dynamic and potentially vulnerable to security breaches. The NF3-UNSW-NB15 Day 1 data reveals a low range of variation in both source and destination IPs, indicative of a controlled environment where a limited number of IPs are engaged. This suggests an environment with established, routine communication patterns, where ports show consistent levels, aligning with a network that experiences few irregularities and maintains a steady communication flow. In contrast, the NF3-BoT-IoT Day 1 plot maintains a lower count of unique source IPs with occasional spikes, suggesting sporadic activation of new source IPs possibly for command and control communications typical of a botnet scenario. Destination IPs show significant variability, likely related to the botnet’s targets or a broader scope of victim engagement. The frequent changes in destination ports reflect dynamic interactions, potentially with multiple target machines or services, highlighting the erratic and potentially malicious nature of botnet activities within this dataset. 5.5 Time-Frequency Representation Given the rich temporal information in network flows, various time and frequency signal processing techniques can be used for the analysis of the network traffic. Time-frequency analysis is a key signal processing technique that allows simultaneous examination of signals in both time and frequency domains, that can provide deeper insights into their underlying patterns. This approach is particularly suited for non-stationary signals, where frequency content varies over time, such as in speech, music, and biomedical signals [53, 54]. Given the burstiness of network traffic [55] where volumes can change rapidly (such as sudden spikes in packet volume during an attack) or exhibit periodicity (such as daily traffic pattern), it behaves as a time series signal with non-stationary properties [56]. Non-stationarity means the statistical properties, such as mean and variance, change over time; hence, conventional frequency domain approaches (like the Fourier transform) cannot deal with the time-varying and non-stationary nature of traffic pattern. Accordingly, time-frequency signal representation might be able to reveal patterns and anomalies in the time-frequency domain, which might be difficult to detect in the raw time-domain data. <details> <summary>2503.04404v2/x26.png Details</summary> ![eaedd71cdf7fd187497ed2f5ab5218197c21612e13e37ee33c4fa2e4eed93e12](http://localhost:8000/v1/image/eaedd71cdf7fd187497ed2f5ab5218197c21612e13e37ee33c4fa2e4eed93e12) ### Visual Description # Technical Document Extraction: Spectrogram Analysis ## 1. Labels, Axis Titles, and Markers - **Y-Axis (Vertical):** - Label: `Frequency (Hz)` - Range: `0` to `30 Hz` - Tick Marks: `0`, `10`, `20`, `30` - **X-Axis (Horizontal):** - Label: `Time (s)` - Range: `0` to `120 s` - Tick Marks: `0`, `20`, `40`, `60`, `80`, `100`, `120` ## 2. Key Trends and Data Points - **High-Intensity Bands:** - **First Band:** - **Frequency:** ~10 Hz - **Time Range:** 15–25 seconds - **Intensity:** Gradual rise (blue → yellow) peaking at ~20 seconds, then tapering off. - **Second Band:** - **Frequency:** ~25 Hz - **Time Range:** 75–85 seconds - **Intensity:** Sharp vertical spike (blue → red) peaking at ~80 seconds, then tapering off. - **Background Activity:** - Predominantly dark blue (low intensity) across all other frequencies and time intervals. ## 3. Color Gradient and Intensity Representation - **Color Scale:** - Blue: Low intensity (background) - Yellow/Green: Moderate intensity (transition zones) - Red: High intensity (peak activity) - **Legend:** - No explicit legend present. Color coding follows standard spectrogram conventions (blue = low, red = high). ## 4. Spatial Grounding of Features - **First Band Coordinates:** - X: 15–25 s, Y: ~10 Hz - **Second Band Coordinates:** - X: 75–85 s, Y: ~25 Hz ## 5. Trend Verification - **First Band:** - Visual trend: Gradual increase in intensity (blue → yellow) centered at ~20 s, followed by a decline. - **Second Band:** - Visual trend: Sharp vertical spike (blue → red) at ~80 s, with rapid decay on either side. ## 6. Component Isolation - **Main Chart:** - Spectrogram occupying the entire image area. - **No Header/Footer:** - No additional text or annotations outside the main chart. ## 7. Data Table Reconstruction (Implied) | Time (s) | Frequency (Hz) | Intensity (Color) | |----------|----------------|-------------------| | 0–15 | 0–30 | Blue | | 15–25 | ~10 | Blue → Yellow | | 25–75 | 0–30 | Blue | | 75–85 | ~25 | Blue → Red | | 85–120 | 0–30 | Blue | ## 8. Additional Notes - **Language:** English (no non-English text detected). - **Assumptions:** - Color gradient follows standard spectrogram conventions due to absence of an explicit legend. - No embedded text or hidden annotations present. </details> (a) <details> <summary>2503.04404v2/x27.png Details</summary> ![f67825179f45467fe95feb2753e061fd94d168ce6ac23375f4ba7e7924861ca0](http://localhost:8000/v1/image/f67825179f45467fe95feb2753e061fd94d168ce6ac23375f4ba7e7924861ca0) ### Visual Description # Technical Document Extraction: Heatmap Analysis ## 1. Axis Labels and Titles - **X-Axis**: Labeled "Time (s)" with numerical markers at intervals of 20 seconds (0, 20, 40, 60, 80, 100, 120). - **Y-Axis**: Labeled "Frequency (Hz)" with numerical markers at intervals of 10 Hz (0, 10, 20, 30). ## 2. Color Gradient and Intensity - **Color Scale**: - Red → Yellow → Green → Blue (high to low intensity). - No explicit legend provided; gradient inferred from color transitions. - **Key Observations**: - **High-Intensity Regions** (Red/Yellow): - Concentrated near the bottom of the y-axis (0–5 Hz). - Temporal clustering between **0–40 seconds**. - **Low-Intensity Regions** (Blue): - Dominates the upper y-axis (5–30 Hz) and extends across the full time range (0–120 seconds). ## 3. Key Trends - **Initial High-Frequency Activity**: - Between **0–40 seconds**, a distinct red/yellow band indicates elevated intensity at lower frequencies (0–5 Hz). - This activity diminishes sharply after 40 seconds, transitioning to blue. - **Stable Low-Intensity State**: - From **40–120 seconds**, the heatmap remains predominantly blue, suggesting minimal or no activity at higher frequencies (5–30 Hz). ## 4. Spatial Grounding and Component Isolation - **Region Segmentation**: - **Header**: No textual elements present. - **Main Chart**: - Time (x-axis) spans 0–120 seconds. - Frequency (y-axis) spans 0–30 Hz. - Color gradient dominates the visualization. - **Footer**: No textual elements present. - **Legend**: Absent in the image. ## 5. Data Interpretation - **Temporal Dynamics**: - The heatmap suggests a transient event or process occurring between **0–40 seconds**, characterized by high-intensity activity at low frequencies. - Post-40 seconds, the system stabilizes, with no significant activity detected across the frequency range. - **Frequency Distribution**: - Activity is predominantly localized to the lowest frequency band (0–5 Hz) during the initial phase. - Higher frequencies (5–30 Hz) show negligible or absent activity throughout the observed period. ## 6. Notes - No textual annotations, legends, or data tables are present in the image. - The absence of a legend limits quantitative interpretation of the color gradient. - The heatmap implies a time-dependent phenomenon with a sharp decline in activity after 40 seconds. ## 7. Conclusion The heatmap visualizes a transient, high-intensity event at low frequencies (0–5 Hz) occurring between **0–40 seconds**, followed by a stable, low-activity state. Further analysis would require additional context or calibration data to quantify the intensity scale. </details> (b) <details> <summary>2503.04404v2/x28.png Details</summary> ![6e50cec4eb8414197db2f10af8a74035bc61d745ed152a04b69a3edf102af44e](http://localhost:8000/v1/image/6e50cec4eb8414197db2f10af8a74035bc61d745ed152a04b69a3edf102af44e) ### Visual Description # Technical Document Analysis of Frequency-Time Heatmap ## 1. Axis Labels and Markers - **Y-Axis (Frequency):** - Label: "Frequency (Hz)" - Markers: 0, 10, 20, 30 Hz - **X-Axis (Time):** - Label: "Time (s)" - Markers: 0, 20, 40, 60, 80, 100, 120 seconds ## 2. Color Scale and Intensity - **Color Gradient:** - Dark blue (low intensity) → Red (high intensity) - No explicit legend or colorbar provided. - **Key Observations:** - Bright red/yellow regions indicate high-frequency activity. - Dark blue regions indicate low/no detectable frequency activity. ## 3. Temporal Trends - **Early Phase (0–80 seconds):** - Intermittent bright spots (red/yellow) at: - **0–30 seconds:** Clustered near 0–10 Hz. - **40–60 seconds:** Peaks at ~5 Hz and ~20 Hz. - Dark blue dominates between 30–40 seconds and 60–80 seconds. - **Late Phase (80–120 seconds):** - Uniform dark blue, indicating no detectable frequency activity. ## 4. Spatial Grounding of Features - **High-Intensity Regions:** - **[0–30s, 0–10Hz]:** Bright red/yellow cluster. - **[40–60s, 5Hz]:** Vertical red streak. - **[40–60s, 20Hz]:** Horizontal yellow band. - **Low-Intensity Regions:** - **[30–40s, 0–30Hz]:** Dark blue. - **[60–80s, 0–30Hz]:** Dark blue. - **[80–120s, 0–30Hz]:** Entirely dark blue. ## 5. Component Isolation - **Main Chart:** - 2D heatmap with time (x-axis) and frequency (y-axis). - No header, footer, or legend present. - **Data Representation:** - Color intensity encodes magnitude (no numerical values provided). ## 6. Conclusion The heatmap visualizes frequency-domain activity over time, with transient high-intensity events in the early phase (0–80s) and complete suppression after 80s. No explicit numerical data or legend is provided, limiting quantitative analysis to qualitative trends. </details> (c) <details> <summary>2503.04404v2/x29.png Details</summary> ![cc2193483c4edf1808fc3a5264cb99f60276c6c8aa56c3d4b2c506a12e57bb25](http://localhost:8000/v1/image/cc2193483c4edf1808fc3a5264cb99f60276c6c8aa56c3d4b2c506a12e57bb25) ### Visual Description # Technical Document Extraction: Heatmap Analysis ## 1. Axis Labels and Markers - **X-Axis**: - Title: `Time (s)` - Range: `0` to `120` seconds - Increment: `20` second intervals - **Y-Axis**: - Title: `Frequency (Hz)` - Range: `0` to `30` Hz - Increment: `10` Hz intervals ## 2. Color Gradient and Intensity - **Color Scale**: - Dark blue → Red (low to high intensity) - No explicit legend present; inferred from gradient. - **Key Observations**: - Red regions indicate peak intensity (highest energy/activity). - Blue regions indicate low/no activity. ## 3. Temporal and Frequency Trends ### Vertical Activity Peaks 1. **20s**: - Narrow vertical blue line at ~5 Hz. 2. **40s**: - Tall vertical blue line spanning ~10–30 Hz. 3. **60–80s**: - Broad horizontal red band (~5–10 Hz) with localized intensity spikes. 4. **120s**: - Narrow vertical blue line at ~15 Hz. ### Background Activity - **0–20s**: - Low-intensity blue background with faint yellow-green gradient near 0 Hz. - **80–120s**: - Predominantly dark blue with sparse activity. ## 4. Spatial Grounding and Component Isolation - **Main Chart**: - Entire heatmap occupies the image. - **No Header/Footer/Text Blocks**: - No embedded text, legends, or annotations. ## 5. Trend Verification - **60–80s Red Band**: - Sustained high-intensity activity (~5–10 Hz) with localized peaks (red cores). - **40s Blue Line**: - Sharp vertical spike across frequency range (10–30 Hz). - **20s and 120s**: - Isolated narrow spikes at specific frequencies. ## 6. Data Table Reconstruction - **No Explicit Data Table Present**: - Information inferred from heatmap intensity and spatial distribution. ## 7. Cross-Reference Validation - **Color Consistency**: - Red = highest intensity (confirmed by 60–80s band). - Blue = low/no activity (confirmed by background and isolated spikes). ## 8. Language and Textual Content - **Language**: English (all labels and axis titles). - **Transcribed Text**: - `Time (s)`, `Frequency (Hz)` ## 9. Summary of Key Data Points | Time (s) | Frequency (Hz) | Intensity (Color) | |----------|----------------|-------------------| | 20 | ~5 | Blue | | 40 | 10–30 | Blue | | 60–80 | 5–10 | Red | | 120 | ~15 | Blue | ## 10. Conclusion The heatmap depicts transient frequency-domain activity over 120 seconds, with dominant energy concentrated at 60–80s (5–10 Hz) and secondary spikes at 20s, 40s, and 120s. No explicit legend or textual annotations are present; interpretation relies on color gradient and spatial distribution. </details> (d) <details> <summary>2503.04404v2/x30.png Details</summary> ![63ff828dc29378b6ccbd9b84bad3fa33bbd679552fec03652bf73414a47445b2](http://localhost:8000/v1/image/63ff828dc29378b6ccbd9b84bad3fa33bbd679552fec03652bf73414a47445b2) ### Visual Description # Technical Analysis of Spectrogram Image ## Image Description The image is a **spectrogram** visualizing frequency components over time. It uses a **color gradient** to represent intensity or amplitude, with darker blue indicating lower values and brighter red/yellow indicating higher values. --- ## Axis Labels and Markers - **X-axis (Time)**: Labeled "Time (s)" with markers at 0, 20, 40, 60, 80, 100, and 120 seconds. - **Y-axis (Frequency)**: Labeled "Frequency (Hz)" with markers at 0, 10, 20, and 30 Hz. --- ## Key Trends and Data Points 1. **Dominant Feature**: A **vertical band of high-intensity activity** (red/yellow) centered around **60–80 seconds**. - **Color Gradient**: Transitions from dark blue (low intensity) at the edges to bright red/yellow (high intensity) near the center of the band. - **Frequency Range**: The band spans the full frequency range (0–30 Hz), with the highest intensity (red) concentrated around **15–20 Hz**. 2. **Background Activity**: Outside the 60–80 second window, the spectrogram is predominantly dark blue, indicating minimal or no detectable activity. --- ## Component Isolation - **Main Chart**: The entire image is a single heatmap with no additional subplots or legends. - **Color Scale**: Implied gradient from blue (low) to red (high) intensity, though no explicit legend is present. --- ## Spatial Grounding - **Legend**: No explicit legend is visible in the image. The color gradient is inferred from the visual transition. - **Data Point Verification**: The red/yellow regions align with the expected high-intensity zone (60–80 seconds), confirming spatial accuracy. --- ## Trend Verification - **Time Series**: The vertical band (60–80 seconds) shows a sharp increase in intensity, peaking around **70 seconds**, followed by a gradual decline. - **Frequency Series**: Within the band, higher frequencies (20–30 Hz) exhibit lower intensity compared to mid-range frequencies (10–20 Hz). --- ## Conclusion The spectrogram reveals a transient event between **60–80 seconds** with significant energy concentrated around **15–20 Hz**. No other notable features or textual data are present in the image. </details> (e) <details> <summary>2503.04404v2/x31.png Details</summary> ![a62ccf7277f864df658907f069f88f335e354629d997a6dbdf7ee2ff115ce8a3](http://localhost:8000/v1/image/a62ccf7277f864df658907f069f88f335e354629d997a6dbdf7ee2ff115ce8a3) ### Visual Description # Technical Document Extraction: Spectrogram Analysis ## Image Description The image is a **spectrogram** visualizing frequency-domain data over time. It uses a **color gradient** to represent signal intensity, with darker blue indicating lower intensity and brighter red/yellow indicating higher intensity. --- ## Axis Labels and Markers - **X-axis (Time)**: - Label: `Time (s)` - Markers: `0, 20, 40, 60, 80, 100, 120` (seconds) - **Y-axis (Frequency)**: - Label: `Frequency (Hz)` - Markers: `0, 10, 20, 30` (Hertz) --- ## Key Trends and Data Points 1. **Frequency Spikes**: - **Vertical lines** appear at specific time intervals, indicating dominant frequencies: - **45s**: Sharp spike at **~5 Hz** (bright red/yellow). - **80s**: Sharp spike at **~5 Hz** (bright red/yellow). - **120s**: Sharp spike at **~10 Hz** (bright red/yellow). - These spikes represent transient events or resonances in the signal. 2. **Background Activity**: - Between spikes, the spectrogram shows **low-intensity blue regions**, indicating minimal or broadband activity. 3. **Intensity Gradient**: - Color intensity correlates with signal strength: - **Dark blue**: Weak/no signal. - **Yellow/Red**: Strong signal (peak intensity). --- ## Color Scale and Legend - **Color Scale**: - No explicit legend is present. - Assumed gradient: - `Dark Blue` → `Light Blue` → `Yellow` → `Red` (increasing intensity). --- ## Spatial Grounding - **Legend Placement**: Not applicable (no legend). - **Data Point Verification**: - Red/yellow regions align with the highest intensity (5 Hz at 45s/80s, 10 Hz at 120s). --- ## Component Isolation - **Main Chart**: - Spectrogram dominates the image, with no headers/footers. - **Regions of Interest**: - **Time Intervals**: - `0–45s`: Low activity. - `45–80s`: Dominant 5 Hz resonance. - `80–120s`: Dominant 5 Hz resonance followed by 10 Hz at 120s. --- ## Conclusion The spectrogram reveals **periodic resonant events** at 5 Hz (45s and 80s) and a higher-frequency event at 10 Hz (120s). No textual data tables or embedded diagrams are present. The absence of a legend limits quantitative intensity interpretation, but the color gradient provides qualitative insights. </details> (f) <details> <summary>2503.04404v2/x32.png Details</summary> ![7721ebe4047e151cec3cf42bb94840962c8b7b2dbf4b2168d1269d20f369549c](http://localhost:8000/v1/image/7721ebe4047e151cec3cf42bb94840962c8b7b2dbf4b2168d1269d20f369549c) ### Visual Description # Technical Document Extraction: Spectrogram Analysis ## 1. Axis Labels and Markers - **X-Axis (Horizontal):** - Label: `Time (s)` - Markers: `0, 20, 40, 60, 80, 100, 120` (evenly spaced intervals) - **Y-Axis (Vertical):** - Label: `Frequency (Hz)` - Markers: `0, 10, 20, 30` (linear scale) ## 2. Color Gradient and Intensity - **Color Scale:** - Blue (dark) → Red (bright): Represents signal intensity (lower to higher). - No explicit legend present; color mapping inferred from gradient. ## 3. Key Trends and Data Points - **Dominant Feature:** - A vertical bright line at **~20 seconds** (x-axis) with a sharp transition from blue to red. - **Peak Frequency:** ~25 Hz (y-axis) at the center of the bright line. - **Duration:** The bright line spans approximately **5 seconds** (20–25 seconds on x-axis). - **Background:** - Uniform dark blue across all other regions, indicating negligible signal activity. ## 4. Spatial Grounding and Component Isolation - **Legend:** - **Absent** in the image. Color interpretation relies on gradient context. - **Regions:** - **Header:** No text or labels. - **Main Chart:** Spectrogram heatmap with time-frequency data. - **Footer:** No text or labels. ## 5. Trend Verification - **Bright Line (20–25s):** - **Visual Trend:** Sharp upward spike in intensity (blue → red) at ~20s, peaking at ~25 Hz. - **Data Points:** - At **20s**: Intensity begins to rise (~10 Hz). - At **22.5s**: Peak intensity (~25 Hz). - At **25s**: Intensity declines back to baseline. - **Other Regions:** - No discernible trends; background remains static at low intensity. ## 6. Missing Elements - **Data Table:** Not present. - **Embedded Text:** None. - **Additional Languages:** English only. ## 7. Summary The spectrogram depicts a transient high-frequency event (~25 Hz) occurring between **20–25 seconds**, with no other significant activity observed. The absence of a legend necessitates reliance on the color gradient for intensity interpretation. </details> (g) <details> <summary>2503.04404v2/x33.png Details</summary> ![40571914493477a2ad716d8b05ed90f64326b9b54d976e13c5278c81b1b8f6a4](http://localhost:8000/v1/image/40571914493477a2ad716d8b05ed90f64326b9b54d976e13c5278c81b1b8f6a4) ### Visual Description # Technical Document Extraction: Heatmap Analysis ## 1. Axis Labels and Markers - **X-Axis (Horizontal):** - Label: `Time (s)` - Markers: `0, 20, 40, 60, 80, 100, 120` (in seconds) - **Y-Axis (Vertical):** - Label: `Frequency (Hz)` - Markers: `0, 10, 20, 30` (in Hertz) ## 2. Chart Type and Visual Structure - **Chart Type:** Heatmap (color intensity represents data magnitude) - **Color Scale:** - Dark blue → Low intensity - Red → High intensity - No explicit legend or colorbar provided. ## 3. Key Trends and Data Points - **Vertical Intensity Peaks:** - **Time ≈ 10s:** Bright vertical line (high intensity) centered at ~5 Hz. - **Time ≈ 80s:** Bright vertical line (high intensity) centered at ~5 Hz. - **Time ≈ 100s:** Bright vertical line (high intensity) centered at ~5 Hz. - **Time ≈ 120s:** Bright vertical line (high intensity) centered at ~5 Hz. - **Background Activity:** - Dark blue regions dominate, indicating low-frequency activity (<5 Hz) for most of the time range. - Subtle horizontal striations (lower intensity) observed between 20–60s and 60–80s. ## 4. Spatial Grounding and Component Isolation - **Legend Placement:** No legend present. - **Regions Analyzed:** - **Header:** Axis labels and markers. - **Main Chart:** Heatmap spanning 0–120s (time) × 0–30 Hz (frequency). - **Footer:** No additional text or annotations. ## 5. Trend Verification - **Line A (10s Peak):** - Visual trend: Sharp upward spike at ~5 Hz, sustained for ~2s. - Extracted data: Peak intensity at [10s, 5 Hz]. - **Line B (80s Peak):** - Visual trend: Similar to Line A, centered at ~5 Hz. - Extracted data: Peak intensity at [80s, 5 Hz]. - **Line C (100s Peak):** - Visual trend: Slightly broader than Line A/B, centered at ~5 Hz. - Extracted data: Peak intensity at [100s, 5 Hz]. - **Line D (120s Peak):** - Visual trend: Narrower spike, centered at ~5 Hz. - Extracted data: Peak intensity at [120s, 5 Hz]. ## 6. Missing Elements - **Data Table:** Not present. - **Explicit Numerical Values:** No numerical data points provided beyond axis markers. - **Units:** All values inferred from axis labels (`s` for seconds, `Hz` for Hertz). ## 7. Observations - The heatmap suggests periodic events (e.g., vibrations, signals) occurring at ~5 Hz intervals, with peaks at 10s, 80s, 100s, and 120s. - No overlapping or intersecting data series observed. - No textual annotations or legends to clarify context (e.g., source of data, measurement conditions). ## 8. Language and Transcription - **Primary Language:** English (all labels and axis titles are in English). - **Other Languages:** None detected. </details> (h) <details> <summary>2503.04404v2/x34.png Details</summary> ![2d5cf439fcb2c2fa5f24812fdf4544e67f17d1d115c1dc36eed5d302ea8e30b0](http://localhost:8000/v1/image/2d5cf439fcb2c2fa5f24812fdf4544e67f17d1d115c1dc36eed5d302ea8e30b0) ### Visual Description # Technical Document Extraction: Heatmap Analysis ## 1. Axis Labels and Markers - **X-Axis**: - Title: "Time (s)" - Markers: 0, 20, 40, 60, 80, 100, 120 (in seconds) - **Y-Axis**: - Title: "Frequency (Hz)" - Markers: 0, 10, 20, 30 (in Hertz) ## 2. Color Gradient and Intensity - **Color Scale**: - Dark blue → Red gradient (no explicit legend provided). - Interpretation: Darker blue = lower intensity; Red = highest intensity. - **Key Observations**: - Bright red/orange regions indicate localized high-intensity signals. - Dark blue dominates the majority of the heatmap, suggesting low-frequency or weak signals. ## 3. Key Trends and Data Points - **Periodic Bright Spots**: - **Time 60s**: - Frequency range: 0–10 Hz. - Peak intensity at ~2–5 Hz (red region). - **Time 100s**: - Frequency range: 0–10 Hz. - Peak intensity at ~2–5 Hz (red region). - **Time 120s**: - Frequency range: 0–10 Hz. - Peak intensity at ~2–5 Hz (red region). - **Background**: - Dark blue regions (0–30 Hz) dominate outside the periodic peaks, indicating minimal signal activity. ## 4. Spatial Grounding and Component Isolation - **Legend**: - No explicit legend present. Color intensity inferred as signal strength/amplitude. - **Regions**: - **Header**: No text or labels. - **Main Chart**: Heatmap with time-frequency correlation. - **Footer**: No text or labels. ## 5. Trend Verification - **Line A (Time 60s)**: - Visual trend: Sharp upward spike at 2–5 Hz, followed by rapid decay. - Extracted data: Peak at ~2.5 Hz (red region). - **Line B (Time 100s)**: - Visual trend: Similar to Line A, with a peak at ~2.5 Hz. - **Line C (Time 120s)**: - Visual trend: Consistent with prior peaks, maintaining ~2.5 Hz intensity. ## 6. Missing Elements - **Data Table**: Not applicable (heatmap format). - **Text Blocks**: No embedded text or legends. ## 7. Final Notes - The heatmap suggests transient, periodic signals at ~2–5 Hz occurring at 60s, 100s, and 120s. - No explicit units or calibration details provided beyond axis labels. - Color intensity interpretation assumes standard heatmap conventions (darker = lower, brighter = higher). </details> (i) Figure 8: Spectrogram representation of various attack classes of NF3-UNSW-NB15 dataset Here, we explore one of these techniques, the spectrogram, to investigate the feasibility of such approaches in the field of ML-based NIDS. Spectrograms are the most common time-frequency techniques used to investigate signal variations over time. Using spectrograms, we can transform raw network flow time series into a richer representation that captures both frequency and temporal characteristics, potentially enhancing the performance of deep learning models. We focus on the NF3-UNSW-NB15 dataset. Figure 8 shows the spectrogram of the most repeated pattern, for each attack class. As can be seen, the Spectrogram of different classes vary significantly in some cases. For instance, while DoS and Worms share some similarities, their patterns still remain distinct from each other and from all other attack classes. Similarly, Fuzzers display a unique time-frequency signature, further differentiating them from other attack types. These results highlight the potential of time-frequency representations in enhancing ML-based NIDS by providing a more detailed characterisation of network traffic patterns. 6 Conclusion The increasing complexity of network traffic and diversity of modern attacks necessitates the incorporation of temporal analysis in network intrusion detection. Current attacks are no longer isolated events, but rather adaptive, time-evolving processes that can take advantage of timing vulnerabilities and encrypted traffic to evade detection. For instance, Advanced Persistent Threats (APTs) occur over extended periods of time, while low-and-slow attacks submerge malicious activity in normal traffic patterns. Additionally, the prevalence of encrypted protocols and the inadequacy of static analysis render temporal features (inter-packet arrival times, flow durations, traffic bursts) essential for detecting subtle attack behaviours. By analysing temporal dynamics, i.e. how the relationships and entities in a network change over time, researchers and practitioners can gain deeper understanding of the evolving nature of network threats, enabling more effective detection and mitigation strategies. In this paper, we try to address this gap by introducing a collection of four standardised NetFlow-based NIDS datasets enriched with detailed temporal features. Despite their importance, comprehensive temporal features have been largely absent from existing NetFlow-based NIDS datasets, limiting researchers’ ability to study attack patterns over time across multiple datasets. These datasets, the NF3 collection, provide a solid foundation for researchers and practitioners to dive into the temporal dynamics of network traffic. By incorporating precise flow start and end times, as well as detailed inter-packet arrival time statistics, these datasets provide a deeper understanding of attack patterns and network behaviour over time. Our primary contribution, in this study, lies in conducting extensive temporal analysis to reveal the dynamics of network traffic and security threats. By visualising traffic distributions, flow length distributions by attack class, and time-frequency domain representations, this study has provided novel insights into network behaviour patterns. By making these temporal feature-enriched NetFlow datasets (NF3-Datasets) publicly available [1], we aim to support ongoing research and development in ML-based network intrusion detection systems. While this work highlights the importance of temporal features in NIDS, several challenges remain open for future exploration. Future research should focus on optimising ML models to leverage the temporal features introduced in this study effectively. Additionally, further work is needed to refine time-frequency-based approaches and evaluate their practicality in real-time intrusion detection scenarios. Investigating alternative temporal representations, such as recurrent neural networks (RNNs) and transformers, may also yield new insights into how sequential learning models can improve attack detection. References - [1] Majed Luay, Siamak Layeghy, Seyedehfaezeh Hosseininoorbin, Mohanad Sarhan, Nour Moustafa, and Marius Portmann. NetFlow V3 NIDS Datasets - The University of Queensland, 2025. Available at: https://staff.itee.uq.edu.au/marius/NIDS_datasets/. - [2] Gernot Vormayr, Joachim Fabini, and Tanja Zseby. Why are my flows different? a tutorial on flow exporters. IEEE Communications Surveys & Tutorials, 22(3):2064–2103, 2020. - [3] Muhammad Fahad Umer, Muhammad Sher, and Yaxin Bi. Flow-based intrusion detection: Techniques and challenges. Computers & Security, 70:238–254, 2017. - [4] Markus Ring, Sarah Wunderlich, Deniz Scheuring, Dieter Landes, and Andreas Hotho. A survey of network-based intrusion detection data sets. Computers & Security, 86:147–167, 2019. - [5] Satish Kumar, Sunanda Gupta, and Sakshi Arora. Research trends in network-based intrusion detection systems: A review. IEEE Access, 9:157761–157779, 2021. - [6] Oluwadamilare Harazeem Abdulganiyu, Taha Ait Tchakoucht, and Yakub Kayode Saheed. A systematic literature review for network intrusion detection system (ids). International Journal of Information Security, 22(5):1125–1162, 2023. - [7] Martin Roesch et al. Snort: Lightweight intrusion detection for networks. In Lisa, volume 99, pages 229–238, 1999. - [8] Yang Guo. A review of machine learning-based zero-day attack detection: Challenges and future directions. Computer communications, 198:175–185, 2023. - [9] Rafath Samrin and D Vasumathi. Review on anomaly based network intrusion detection system. In 2017 international conference on electrical, electronics, communication, computer, and optimization techniques (ICEECCOT), pages 141–147. IEEE, 2017. - [10] Seyedehfaezeh Hosseininoorbin, Siamak Layeghy, Mohanad Sarhan, Raja Jurdak, and Marius Portmann. Exploring edge tpu for network intrusion detection in iot. Journal of Parallel and Distributed Computing, 179:104712, 2023. - [11] Giovanni Apruzzese, Luca Pajola, and Mauro Conti. The cross-evaluation of machine learning-based network intrusion detection systems. IEEE Transactions on Network and Service Management, 19(4):5152–5169, 2022. - [12] Ramjee Prasad, Vandana Rohokale, Ramjee Prasad, and Vandana Rohokale. Artificial intelligence and machine learning in cyber security. Cyber security: the lifeline of information and communication technology, pages 231–247, 2020. - [13] Liam Daly Manocchio, Siamak Layeghy, Wai Weng Lo, Gayan K Kulatilleke, Mohanad Sarhan, and Marius Portmann. Flowtransformer: A transformer framework for flow-based network intrusion detection systems. Expert Systems with Applications, 241:122564, 2024. - [14] Ankit Thakkar and Ritika Lohiya. A review of the advancement in intrusion detection datasets. Procedia Computer Science, 167:636–645, 2020. International Conference on Computational Intelligence and Data Science. - [15] Giovanni Apruzzese, Pavel Laskov, and Johannes Schneider. Sok: Pragmatic assessment of machine learning for network intrusion detection. In 2023 IEEE 8th European Symposium on Security and Privacy (EuroS&P), pages 592–614, 2023. - [16] Mohanad Sarhan, Siamak Layeghy, Nour Moustafa, and Marius Portmann. Netflow datasets for machine learning-based network intrusion detection systems. In Zeng Deze, Huan Huang, Rui Hou, Seungmin Rho, and Naveen Chilamkurti, editors, Big Data Technologies and Applications, pages 117–135, Cham, 2021. Springer International Publishing. - [17] Mohanad Sarhan, Siamak Layeghy, and Marius Portmann. Towards a standard feature set for network intrusion detection system datasets. Mobile networks and applications, pages 1–14, 2022. - [18] Benoît Claise. Cisco Systems NetFlow Services Export Version 9. RFC 3954, October 2004. - [19] Ziadoon K. Maseer, Robiah Yusof, Baidaa Al-Bander, Abdu Saif, and Qusay Kanaan Kadhim. Meta-analysis and systematic review for anomaly network intrusion detection systems: Detection methods, dataset, validation methodology, and challenges, 2023. - [20] Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 Military Communications and Information Systems Conference (MilCIS), pages 1–6, 2015. - [21] Nickolaos Koroniotis, Nour Moustafa, Elena Sitnikova, and Benjamin Turnbull. Towards the development of realistic botnet dataset in the internet of things for network forensic analytics: Bot-iot dataset. Future Generation Computer Systems, 100:779–796, 2019. - [22] Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton_iot datasets. Sustainable Cities and Society, 72:102994, 2021. - [23] Iman Sharafaldin, Arash Habibi Lashkari, Ali A Ghorbani, et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018. - [24] Seyedehfaezeh Hosseininoorbin, Siamak Layeghy, Brano Kusy, Raja Jurdak, and Marius Portmann. Harbic: Human activity recognition using bi-stream convolutional neural network with dual joint time–frequency representation. Internet of Things, 22:100816, 2023. - [25] Seyedehfaezeh Hosseininoorbin, Siamak Layeghy, Brano Kusy, Raja Jurdak, Greg J. Bishop-Hurley, Paul L Greenwood, and Marius Portmann. Deep learning-based cattle behaviour classification using joint time-frequency data representation. Computers and Electronics in Agriculture, 187:106241, 2021. - [26] Adnan Shahid Khan, Zeeshan Ahmad, Johari Abdullah, and Farhan Ahmad. A spectrogram image-based network anomaly detection system using deep convolutional neural network. IEEE Access, 9:87079–87093, 2021. - [27] Zeeshan Ahmad, Adnan Shahid Khan, Sehrish Aqeel, Azlina Ahmadi Julaihi, Seleviawati Tarmizi, Noralifah Annuar, and Mohammed Sayeeduddin Habeeb. S-ads: Spectrogram image-based anomaly detection system for iot networks. In 2022 Applied Informatics International Conference (AiIC), pages 105–110, 2022. - [28] Shahid Tufail, Hugo Riggs, Mohd Tariq, and Arif I. Sarwat. Advancements and challenges in machine learning: A comprehensive review of models, libraries, applications, and algorithms. Electronics, 12(8), 2023. - [29] Lubna Ali Hassan Ahmed, Yahia Abdalla Mohamed Hamad, and Ahmed Abdallah Mohamed Ali Abdalla. Network-based intrusion detection datasets: A survey. In 2022 International Arab Conference on Information Technology (ACIT), pages 1–7, 2022. - [30] Mossa Ghurab, Ghaleb Gaphari, Faisal Alshami, Reem Alshamy, and Suad Othman. A detailed analysis of benchmark datasets for network intrusion detection system. Asian Journal of Research in Computer Science, 7(4):14–33, 2021. - [31] Siamak Layeghy, Marcus Gallagher, and Marius Portmann. Benchmarking the benchmark — comparing synthetic and real-world network ids datasets. Journal of Information Security and Applications, 80:103689, 2024. - [32] Robert Flood and David Aspinall. Measuring the complexity of benchmark nids datasets via spectral analysis. In 2024 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), pages 335–341. IEEE, 2024. - [33] Anukool Lakhina, Konstantina Papagiannaki, Mark Crovella, Christophe Diot, Eric D. Kolaczyk, and Nina Taft. Structural analysis of network traffic flows. SIGMETRICS Perform. Eval. Rev., 32(1):61–72, June 2004. - [34] George Nychis, Vyas Sekar, David G. Andersen, Hyong Kim, and Hui Zhang. An empirical evaluation of entropy-based traffic anomaly detection. In Proceedings of the 8th ACM SIGCOMM Conference on Internet Measurement, IMC ’08, page 151–156, New York, NY, USA, 2008. Association for Computing Machinery. - [35] Anukool Lakhina, Konstantina Papagiannaki, Mark Crovella, Christophe Diot, Eric D. Kolaczyk, and Nina Taft. Structural analysis of network traffic flows. In Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS ’04/Performance ’04, page 61–72, New York, NY, USA, 2004. Association for Computing Machinery. - [36] Piotr Jurkiewicz, Grzegorz Rzym, and Piotr Boryło. Flow length and size distributions in campus internet traffic. Computer Communications, 167:15–30, 2021. - [37] Anshuman Chhabra and Mariam Kiran. Classifying elephant and mice flows in high-speed scientific networks. Proc. INDIS, pages 1–8, 2017. - [38] Mosab Hamdan, Bushra Mohammed, Usman Humayun, Ahmed Abdelaziz, Suleman Khan, M. Akhtar Ali, Muhammad Imran, and M. N. Marsono. Flow-aware elephant flow detection for software-defined networks. IEEE Access, 8:72585–72597, 2020. - [39] Kaihao Lou, Yongjian Yang, and Chuncai Wang. An elephant flow detection method based on machine learning. In Smart Computing and Communication: 4th International Conference, SmartCom 2019, Birmingham, UK, October 11–13, 2019, Proceedings 4, pages 212–220. Springer, 2019. - [40] Spurthi Mallesh. Automatic detection of elephant flows through openflow-based openvswitch. PhD thesis, Dublin, National College of Ireland, 2017. - [41] Li Ming Chen, Shun-Wen Hsiao, Meng Chang Chen, and Wanjiun Liao. Slow-paced persistent network attacks analysis and detection using spectrum analysis. IEEE Systems Journal, 10(4):1326–1337, 2016. - [42] Theophilus Benson, Aditya Akella, and David A Maltz. Network traffic characteristics of data centers in the wild. In Proceedings of the 10th ACM SIGCOMM conference on Internet measurement, pages 267–280, 2010. - [43] Srikanth Kandula, Sudipta Sengupta, Albert Greenberg, Parveen Patel, and Ronnie Chaiken. The nature of data center traffic: measurements & analysis. In Proceedings of the 9th ACM SIGCOMM Conference on Internet Measurement, IMC ’09, page 202–208, New York, NY, USA, 2009. Association for Computing Machinery. - [44] Benoit Claise. Cisco systems netflow services export version 9. Technical report, Cisco Systems, 2004. - [45] Andrea Corsini, Shanchieh Jay Yang, and Giovanni Apruzzese. On the evaluation of sequential machine learning for network intrusion detection. In Proceedings of the 16th International Conference on Availability, Reliability and Security, ARES ’21, New York, NY, USA, 2021. Association for Computing Machinery. - [46] Xueying Han, Rongchao Yin, Zhigang Lu, Bo Jiang, Yuling Liu, Song Liu, Chonghua Wang, and Ning Li. Stidm: A spatial and temporal aware intrusion detection model. In 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), pages 370–377, 2020. - [47] Yong Zhang, Xu Chen, Lei Jin, Xiaojuan Wang, and Da Guo. Network intrusion detection: Based on deep hierarchical network and original flow data. IEEE Access, 7:37004–37016, 2019. - [48] Jiawei Zhao, Rahat Masood, and Suranga Seneviratne. A review of computer vision methods in network security. IEEE Communications Surveys & Tutorials, 23(3):1838–1878, 2021. - [49] Abhishek Divekar, Meet Parekh, Vaibhav Savla, Rudra Mishra, and Mahesh Shirole. Benchmarking datasets for anomaly-based network intrusion detection: Kdd cup 99 alternatives. In 2018 IEEE 3rd International Conference on Computing, Communication and Security (ICCCS), pages 1–8, 2018. - [50] Mohanad Sarhan, Siamak Layeghy, and Marius Portmann. Evaluating standard feature sets towards increased generalisability and explainability of ml-based network intrusion detection. Big Data Research, 30:100359, 2022. - [51] Ntop. nprobe, an extensible netflow v5/v9/ipfix probe for ipv4/v6, 2017. Accessed: 2024-05-21. - [52] Noam Ben-Asher and Cleotilde Gonzalez. Effects of cyber security knowledge on attack detection. Computers in Human Behavior, 48:51–61, 2015. - [53] Siamak Layeghy, Ghasem Azemi, Paul Colditz, and Boualem Boashash. Non-invasive Monitoring of Fetal Movements Using Time-Frequency Features of Accelerometry. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4379–4383. IEEE, 2014. - [54] Siamak Layeghy, Ghasem Azemi, Paul Colditz, and Boualem Boashash. Classification of Fetal Movement Accelerometry Through Time-Frequency Features. In 2014 8th International Conference on Signal Processing and Communication Systems (ICSPCS), pages 1–6. IEEE, 2014. - [55] W.E. Leland, M.S. Taqqu, W. Willinger, and D.V. Wilson. On the Self-similar Nature of Ethernet Traffic . IEEE/ACM Transactions on Networking, 2(1):1–15, 1994. - [56] Yuguang Yang, Shupeng Geng, Baochang Zhang, Juan Zhang, Zheng Wang, Yong Zhang, and David Doermann. Long Term 5G Network Traffic Forecasting via Modeling Non-stationarity with Deep Learning. Communications Engineering, 2(1):33, 2023.

Rendering Paper...