rDNS in IP Intelligence Development
In today’s digital world, the internet has become an indispensable part of people’s daily lives and business activities. Within this vast and complex network ecosystem, IP addresses are fundamental for connecting and identifying various network devices and services. However, simply knowing a device’s IP address is not sufficient to fully understand its role and behavior within the network. To gain a deeper understanding and better utilize IP addresses, reverse DNS (rDNS) technology provides valuable information, helping us uncover more details about IP addresses.
What is rDNS?
Contrary to DNS, rDNS is the process of converting an IP address into a domain name, achieved by querying PTR and CNAME records. An IP address can correspond to multiple domain names, representing a one-to-many relationship. In contrast, a reverse lookup of an IP address typically returns zero or one hostname, representing a one-to-one relationship. Typical uses of reverse DNS include:
- Verify sender’s identity
rDNS is widely used in the process of sending emails as part of preventing spam and phishing attacks. When an email is sent from a server, the receiving server can look at the IP address of the sending server and perform an rDNS query to confirm the domain name associated with that IP address. If the rDNS record matches the sending server’s claim, this can serve as a verification of the sending server’s identity, thereby increasing the likelihood of the email being legitimate. For example, AOL (America Online) requires that the sender’s mail server must be reverse resolvable to send emails to AOL/AIM mailboxes. For some large network providers, configuring the correct reverse DNS records is also part of establishing the trustworthiness and reliability of internet services.
- Network troubleshooting and tracking
In network management and troubleshooting, knowing the domain name information of the requesting device is very useful. rDNS can help network administrators identify specific hosts in network traffic and understand the source of the traffic, enabling them to diagnose issues effectively.
- Implementing security policies and access control
Some network services and applications may rely on reverse domain names to implement security policies or access control. Through rDNS, services can check the domain name behind the IP address of access requests to determine whether to allow access or apply specific rules. For example, identifying benign bots by verifying the request header User-Agent and the reverse resolution record of the IP to confirm whether the request comes from a genuine SEO crawler.
Deriving geographic location information from reverse DNS
Executing host 96.227.5.107
in the terminal yields pool-96-227-5-107.phlapa.east.verizon.net.
, indicating that the subdomain part .phlapa
suggests the IP is located in Philadelphia, Pennsylvania, USA. Similarly, performing a reverse lookup for 63.153.137.40
results in 63-153-137-40.sxfl.qwest.net.
, where the subdomain part .sxfl.
is an abbreviation for Sioux Falls, South Dakota, USA. By examining the hostname, one can reasonably infer its geographic location, illustrating that rDNS records can provide additional value for IP geolocation. However, considerations include:
- How many IP addresses in the entire IPv4 address space can be reverse-resolved?
- How many rDNS records containing geographic location information exist among the IP addresses that can be reverse-resolved?
In the paper “IP Geolocation through Reverse DNS“, we found a reasonably sound solution. The paper proposes a machine learning approach to extract location information from rDNS, treating the task as a machine learning problem. For a given hostname, it generates a candidate location list, and then utilizes a binary classifier to classify each hostname and candidate location pair to determine which candidate locations are reasonable. Finally, the remaining candidate locations are ranked based on confidence and correlated with population numbers. This method effectively supplements and enhances the accuracy of current mainstream IP Geo commercial databases.
Figure 1: Distribution of rDNS in the IPv4 address space
Out of the total 4.3 billion IPv4 address space, there are approximately 2.568 billion public IP addresses. The graph above shows that 1.25 billion IPv4 addresses have valid rDNS. Among these 1.25 billion IPv4 addresses, around 160 million contain precise city matches, and 270 million contain airport codes. This means that only about 12% of all valid reverse DNS records contain useful geographic location information. Their coverage is insufficient to form a complete IP Geo database independently, but they can contribute to enriching existing IP geolocation information. Additionally, the paper analyzes the changes in rDNS from 2014 to 2018, showing that 64.8% of rDNS resolution records remain unchanged, and the number of reverse DNS records grows slowly each year. This fact is consistent with the latest rDNS data, which now stands at approximately 1.286 billion.
Figure 2: Changes in rDNS hostnames from 2014 to 2018
Commercial IP geolocation databases provided by companies like MaxMind, Neustar IP Intelligence, and IP2Location combine various sources of information and algorithms to achieve high coverage. These sources include WHOIS information, network latency data, network topology information, and collaboration with ISPs, as well as analyzing network social graphs. However, they may still lack location information for certain IP ranges. Utilizing network topology and latency information has several limitations:
(1) There is a need for globally distributed mapping node resources.
(2) Not every public IP allows ICMP ping or exposes its network topology.
(3) Route distances obtained through traceroute cannot fully map to real-world geographical distances. However, utilizing rDNS for geolocation can overcome these limitations.
Deriving IP connection types from reverse DNS
Some reverse DNS resolution records may contain information about the connection type:
In this case, rDNS information can serve as a way to explore IP usage scenarios (such as residential, enterprise, data center, etc.). For security products, they typically apply different levels of protection strategies to various IP types. For example, they may adopt more cautious and conservative protection measures for dynamic residential IPs because the majority of real-user access comes from dynamic IPs that are shared among multiple users, making it prone to false positives. However, for enterprise-dedicated line IPs, the decision-making cost is not as high.
Dynamic IPs are allocated by multiple network providers into IP pools. Therefore, it is difficult to directly confirm whether an IP is dynamic just from WHOIS information and other sources. However, through rDNS, certain keyword information can be used to make some judgments. As a simple example, querying the rDNS database for IPs containing the keyword “dynamic” in their PTR records, we find that over 58 million IPs can be directly matched.
Figure 3: Number of matched IPs
Combining some other third-party threat intelligence data can preliminarily validate this hypothesis.
Figure 4: Third-party threat intelligence data
Of course, the actual number of dynamic IPs is much higher. Here, I’m just scratching the surface. Similarly, because not every IP has an rDNS record, as mentioned in the second chapter, reverse DNS records can supplement and enrich the connection types of IPs. However, relying solely on rDNS is not enough to constitute a complete database.
Using ZDNS to perform IPv4 reverse DNS resolution
The paper “ZDNS: A Fast DNS Toolkit for Internet Measurement” introduces a high-performance and scalable DNS probing framework, consisting of three main components: a DNS library, core framework, and composable modules. The DNS library implements its own caching recursive resolver library, providing recursive queries, caching, validation, packet exchange records, and simplified DNS resolution. The core framework is used to simplify command-line interaction, while composable modules are used for easy functional expansion. The implementation of ZDNS follows several design principles:
- Internal recursion: ZDNS supports recursive resolution to understand various characteristics of DNS operations. This is because public recursive resolvers often hide many DNS resolution features and frequently impose query rate limits. Therefore, support for internal self-recursion is needed.
- Security: The DNS protocol defines many RFC standards, and as of 2022, there are over 65 types of DNS records. Because servers often return incorrectly formatted responses due to misconfiguration or malicious operations, ZDNS uses memory-safe programming languages and supports modular interfaces.
- High performance: When using external recursive resolvers, it performs 90,000 resolutions per second, scans 50 million domains within 10 minutes, and resolves the entire IPv4 address space’s PTR records within 12 hours.
The following figure shows ZDNS’s resolution performance data:
Figure 5: ZDNS Resolution Performance Data
If we only consider the data performance of PTR records, using Google’s public DNS resolution took only 12.1 hours to complete the resolution of the entire IPv4 address space, achieving a success rate of 93.0%. On the other hand, using internal recursive resolution took 116.7 hours, but maintained a success rate of 88.5%. The significant decrease in performance when using internal recursive resolution is partly due to the inability to leverage the caching of public resolution to accelerate queries, and partly because the internal recursive mechanism requires a certain number of queries before it can return a response. Comparing A records with PTR records, as the number of resolutions increases from 50 million to several billion, ZDNS’s success rate decreases by less than 5%.
Conclusion
Reverse DNS (rDNS) technology, is an essential tool in network management and security and provides strong support for enriching IP address information, identifying geographic locations, and connection types.
When deriving geographical location information from rDNS, we can correlate domain names from rDNS records with a geographical location database, thereby inferring the geographical location information of IP addresses and providing data support for geolocation and location-aware services. This is crucial for services like network positioning and content delivery network optimization.
On the other hand, by deriving IP connection types from rDNS, we can classify and identify applications and services associated with IP addresses based on the domain names and service types found in rDNS records. This aids in network security and traffic management, enhancing visualization and understanding of network activities.
Utilizing tools such ZDNS for IPv4 reverse DNS resolution enables rapid and efficient large-scale rDNS queries, facilitating network management and security analysis. The application of such tools provides technical support for practical network information gathering and analysis, contributing to the rational utilization of network resources and strengthening network security.