Check proxy IP’s for connection status, speed, and detection, with Python

In Software Engineering, Snippet

Objective: Tests whether a proxy IP is alive at the time it is tested.

Notes: The list of proxy IP’s should be delimited with a newline. Note that no history of uptime is kept. For public proxy servers (which can go up and down at a moment’s notice), it’s possible that by the time all the IP’s in the list are checked, the connection status of some proxies has changed.

The “speed test” is simply the difference in the time at connection initiation, and the time when the page is returned. Since this is influenced by your local connection environment, as well as the server you’re using to check your external IP, this speed check is a very rough estimate. Additionally, only a sample size of one is taken, and it’s plausible that some IP checkers might throttle repeated checking.

There is no fancy proxy detection code here (until I try my hand at writing it). Rather, it relies on whatismyip.com’s built-in proxy detection. http://whatismyipaddress.com/proxy-check is another online proxy detector I am thinking of including.

IP Checkers: This code only works with IP checkers that return just one string — the IP address. There’s no HTML parsing here.

Compatible IP address checkers include http://ip.42.pl/rawhttp://www.icanhazip.com, and http://automation.whatismyip.com/n09230945.asp. I would recommend using whatismyip.com. Among the three, I believe only whatismyip.com has an automation policy, that asks that automated scripts ping it only once every 5 minutes (and thus the time.sleep(300) at the end of each check_proxy(pip) iteration).

Code

#! /usr/bin/env python

import urllib2
import urllib
import time
import socket

# Set some global variables
proxy_list = open('proxy_list.txt', 'r')
ip_check_url = 'http://automation.whatismyip.com/n09230945.asp'
user_agent = 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0'
socket_timeout = 30

# Get real public IP address
def get_real_pip():
    req = urllib2.Request(ip_check_url)
    req.add_header('User-agent', user_agent)
    conn = urllib2.urlopen(req)
    page = conn.read()
    return page

# Set global variable containing "real" public IP address
real_pip = get_real_pip()

# Check proxy
def check_proxy(pip): 
    try:
        # Build opener
        proxy_handler = urllib2.ProxyHandler({'http':pip})
        opener = urllib2.build_opener(proxy_handler)
        opener.addheaders = [('User-agent', user_agent)]
        urllib2.install_opener(opener)

        # Build, time, and execute request
        req = urllib2.Request(ip_check_url)
        time_start = time.time()
        conn = urllib2.urlopen(req)
        time_end = time.time()
        detected_pip = conn.read()

        # Calculate request time
        time_diff = time_end - time_start

        # Check if proxy is detected
        if detected_pip == real_pip:
            proxy_detected = True
        else:
            proxy_detected = False

    # Catch exceptions
    except urllib2.HTTPError, e:
        # print "ERROR: Code ", e.code
        return (True, False, 999)
    except Exception, detail:
        # print "ERROR: ", detail
        return (True, False, 999)

    # Return False if no exceptions, proxy_detected=True if proxy detected
    return (False, proxy_detected, time_diff)  

def main():
    socket.setdefaulttimeout(socket_timeout)

    print "Current Public IP: " + real_pip
    print

    for current_proxy in proxy_list:
        current_proxy = current_proxy.strip()
        (proxy_failed, proxy_detected, time_diff) = check_proxy(current_proxy)
        if proxy_failed:
            print ("  FAILED: " + current_proxy)
        else:
            if proxy_detected:
                print "  DETECTED: %s ( %ss )" % ( current_proxy, str(round(time_diff, 2)) )
            else:
                print "  WORKING: %s ( %ss )" % ( current_proxy, str(round(time_diff, 2)) )
        time.sleep(300)

if __name__ == '__main__':
    main()

References

The code below was originally based on StackOverflow #765305, which itself borrows some code from a blog post.

Leave a Reply