The shell command curl and wget can be called (using os.system or subprocess.run) to download files from internet. You can also download files using Python modules directly of course.

In [18]:

url = "http://www.legendu.net/media/download_code_server.py"

urllib.request.urlretrieve¶

urllib.request.urlretrieve can be used to download a file from the internet to local. For more details, please refer to Hands on the urllib Module in Python.

In [14]:

import urllib.request

file, http_msg = urllib.request.urlretrieve(
    "http://www.legendu.net/media/download_code_server.py",
    "/tmp/download_code_server.py",
)

In [15]:

file

Out[15]:

'/tmp/download_code_server.py'

In [9]:

!ls /tmp/download_code_server.py

/tmp/download_code_server.py

In [16]:

http_msg

Out[16]:

<http.client.HTTPMessage at 0x7fe9efcaa358>

In [17]:

http_msg.as_string()

Out[17]:

'Server: GitHub.com\nContent-Type: application/octet-stream\nLast-Modified: Fri, 24 Jan 2020 20:21:29 GMT\nETag: "5e2b51c9-2de"\nAccess-Control-Allow-Origin: *\nExpires: Fri, 24 Jan 2020 20:34:29 GMT\nCache-Control: max-age=600\nX-Proxy-Cache: MISS\nX-GitHub-Request-Id: 6ACA:869A:42BECA:4B481B:5E2B527D\nContent-Length: 734\nAccept-Ranges: bytes\nDate: Fri, 24 Jan 2020 22:19:35 GMT\nVia: 1.1 varnish\nAge: 339\nConnection: close\nX-Served-By: cache-sea4480-SEA\nX-Cache: HIT\nX-Cache-Hits: 1\nX-Timer: S1579904375.100592,VS0,VE0\nVary: Accept-Encoding\nX-Fastly-Request-ID: 44fa67063caa264fc25f2cc26353c8dfc534ae66\n\n'

requests¶

Notice that you must open the file to write into with the mode wb.

In [20]:

import requests
import shutil

resp = requests.get(url, stream=True)
if not resp.ok:
    sys.exit("Network issue!")
with open("/tmp/download_code_server_2.py", "wb") as fout:
    shutil.copyfileobj(resp.raw, fout)

In [21]:

!ls /tmp/download_code_server_2.py

/tmp/download_code_server_2.py

In [22]:

!cat /tmp/download_code_server_2.py

#!/usr/bin/env python3
import urllib.request
import json


class GitHubRepoRelease:

    def __init__(self, repo):
        self.repo = repo
        url = f"https://api.github.com/repos/{repo}/releases/latest"
        self._resp_http = urllib.request.urlopen(url)
        self.release = json.load(self._resp_http)

    def download_urls(self, func=None):
        urls = [asset["browser_download_url"] for asset in self.release["assets"]]
        if func:
            urls = [url for url in urls if func(url)]
        return urls


if __name__ == '__main__':
    release = GitHubRepoRelease("cdr/code-server")
    url = release.download_urls(lambda url: "linux-x86_64" in url)[0]
    urllib.request.urlretrieve(url, "/tmp/code.tar.gz")

wget¶

There is no option to overwrite an existing file currently. However, this can be achieved by renaming/moving the downloaded file (using shutil).

In [25]:

import wget

wget.download(url, out="/tmp/download_code_server_3.py")

Out[25]:

'/tmp/download_code_server_3.py'

In [26]:

import wget

wget.download(url, out="/tmp/download_code_server_3.py", bar=wget.bar_adaptive)

Out[26]:

'/tmp/download_code_server_3 (1).py'

Configure proxy for the Python module wget.

In [ ]:

import socket
import socks

socks.set_default_proxy(socks.SOCKS5, "localhost")
socket.socket = socks.socksocket

pycurl¶

In [29]:

import pycurl

with open("/tmp/download_code_server_4.py", "wb") as fout:
    c = pycurl.Curl()
    c.setopt(c.URL, url)
    c.setopt(c.WRITEDATA, fout)
    c.perform()
    c.close()

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-29-0c73f10fc26e> in <module>
----> 1 import pycurl
      2 
      3 with open('/tmp/download_code_server_4.py', 'wb') as fout:
      4     c = pycurl.Curl()
      5     c.setopt(c.URL, url)

ModuleNotFoundError: No module named 'pycurl'

References¶

https://stackabuse.com/download-files-with-python/

https://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python

In [ ]:

Ben Chuanlong Du's Blog

It is never too late to learn.

Download Files from the Internet in Python

urllib.request.urlretrieve¶

requests¶

wget¶

pycurl¶

References¶

Comments