Showing posts with label install. Show all posts
Showing posts with label install. Show all posts

Thursday, June 19, 2014

Exercise 16: Decode a Web Page

My sincerest apologies for being late in posting these exercises. My boyfriend came to visit me in Jerusalem for the last two weeks, so I haven't had any spare cycles to tackle the Python problems. I should learn to write a few in advance in case I am ever put in this kind of situation again. In any case, this exercise should make up for the lack of exercises the last few weeks - it's a fun one. This is a slightly longer and more involved exercise than many previous ones, so I will not post the solution for two weeks, but I will post a new exercise next week. Enjoy!

Exercise


Use the BeautifulSoup and requests Python packages to print out a list of all the article titles on the New York Times homepage.

Discussion

Concepts for this week:
  • Libraries
  • requests
  • BeautifulSoup

Libraries

Many people have written libraries in Python that do not come with the standard distribution of Python (like the random library mentioned in a previous post). These libraries can do anything from machine learning to date and time formatting to meme generation. If you have a task you need done, most likely someone has written a library for it.

There are three main things to keep in mind when using a library:
  1. You need to install it. Installation in GNU/Linux based systems will generally be easier than on Windows or OSX, but there will always be documentation for how to do it.
  2. You need to import it. At the top of your program, make sure you write the line import requests, or whatever the name of your library is. Then you can use it to your heart's content.
  3. You need to read documentation. Someone else wrote it, so the rules might not be so obvious. Anyone (or any group) that writes a Python package writes documentation for it. Eventually, reading documentation will become second nature.

Requests


One of the most useful libraries written for Python recently, requests does "HTTP for humans." What this means in laymen's terms is that it asks the internet things from Python. When you type "facebook.com" into the browser, you are asking the internet to show you Facebook's homepage.

In the same way, a program can ask the internet something. It might not be "show me Facebook", but you can for example ask Github for a list of all the repositories that the user "mprat" has. You can do this with an API (Application Programming Interface). This exercise doesn't use APIs, so we'll talk more about those in a later post.

Back to showing the user a webpage. When I type "facebook.com" into the browser, Facebook sends my browser a bunch of HTML (basically, code for how the website looks). The browser then takes this HTML and shows it to me in a pretty way. (Fun fact: to see the HTML of any page in a browser, right click on the page and "Inspect Element" or "View Source" depending on your browser. In Chrome, "Inspect Element" will pop up a module at the bottom of your page where you can see the HTML from the page. This trick will come in handy when you're doing the exercise. If you need to DO anything with this HTML, better to use a program. More posts about this coming later.) If I want to "see" a webpage with a program, all I need to do is ask it for it's HTML and read it.

The 'requests' library does half of that job: it asks (requests, if you will) a server for information. This could be just data (through an API - more later) or in the case of this exercise, HTML.
Look at the documentation for all the details you need. In this particular latest version, all you need to do to ask a website for it's HTML is:

import requests
url = 'http://github.com'
r = requests.get(url)
r_html = r.text

Now inside the variable r_html, you have the HTML of the page as a string. Reading (otherwise called parsing) happens with a different Python package.

BeautifulSoup


To solve our problem of parsing (reading, understanding, interpreting) the string of HTML we got from requests, we use the BeautifulSoup library.

What it does is give a hierarchical (a pyramid structure) to the HTML in the document. If you don't know anything about HTML, the Wikipedia article is a good summary. For the purposes of this exercise, you don't need to know anything about HTML beyond being able to look at it quickly.

Because BeautifulSoup takes care of interpreting our HTML for us, we can ask it things like: "give me all the lines with <p> tags" or "find me the parent element to the <title> element", etc.

Your code would look something like this:

from bs4 import BeautifulSoup

# some requests code here for getting r_html 

soup = BeautifulSoup(r_html)
title = soup.find('span', 'articletitle').string

And you can do many more things in BeautifulSoup, but I will leave you to explore those by yourself or through other later exercises.

Happy coding!


Explore away!
Forgot how to submit exercises?

Wednesday, January 15, 2014

Week 0: Installing and Coding Python on Your Own Computer

You can code in Python on at the very least the 3 major operating systems (Mac, Windows, Linux) - I make no guarantees about Android, FirefoxOS, or ChromeOS. To effectively write code in Python, you need three things installed on your computer:

  1. The Python interpreter, or the program that will run the Python code you write
  2. A place to write your Python code, like a text editor or something fancier
  3. Something that can help you install interesting Python packages in the future (this is called a package manager
If you use OSX or a flavor of Linux, you already have the Python interpreter installed on your computer, and you have some kind of text editor to write your code in. If you have Windows, you need to install the Python interpreter no matter what. My recommendation for beginners is to install an IDE (integrated development environment) to write your code in. It is very common for Python-specific IDEs to come pre-packaged with Python, so all you need to do is install the IDE and start coding. The main difference between an IDE and a regular text editor is that an IDE gives a few more tools to the programmer - for example, you do not need to compile and run your code from a terminal if you use an IDE. The choice to use one is up to you, and I encourage you to try out a few different ways of writing Python code to see which one you like better.

I am not going to tell you about package managers for Python yet, but eventually you will need to install one to install interesting Python libraries to play with.


A Possible IDE for Python:

There are many options for IDEs / text editors that you can use - some more customizable, some more package-friendly, some more interactive, some more "hardcore". The long story short is that it doesn't matter what IDE / text editor you use, as long as you have Python installed on your system and you know how to use it. I have listed a number of IDEs / editors here that you can use. The one I recommend for beginners is Enthought Canopy, just because it automatically installs Python, a pretty decent IDE, and a bunch of mathematical packages that are annoying to install manually. The same goes for Anaconda. But ultimately the choice is yours.

The IDE you choose is totally a matter of preference. If you are just starting, I recommend using one of the standard packaged IDEs - Enthought, Anaconda, IDLE, and switching between them. Try all of them and decide which one you like best. You might get to the point where you want to optimize your keystrokes or find yourself doing the same type of operation over and over again. But for 95% of ordinary users, these packaged IDEs work fantastically. Try one out, but be flexible. The only thing you should never do is pay for an IDE - there are plenty of free ones out there that work great, so don't pay for something until you're sure it's what you want.
What do I use, personally? On my Windows machine I have Enthought Canopy, on my Linux machine I use Sublime Text 3, and on a Mac I use Sublime as well. But I used to be a heavy IDLE user when I first started with Python.

Python 2 or Python 3:

You might have heard some of your coder friends talking about this new "Python 3" movement. The short story is this: Python 2.7 has been the industry standard for years, but there were a few major annoyances that caused developers problems. Python 3 was released in late 2008 and has been continuously improved ever since, but it is not yet up to speed with industry standard. The main problem is "legacy libraries" and "legacy code", that is, code / packages that were written back in the Python 2.7 days that were not translated into Python 3. 

Because you are most likely not maintaining legacy code with Python, the recommendation is to learn Python 3 - it is the future of Python, and packages are slowly being ported to Python 3. All the exercises I will post here will assume you have Python 3 installed.

Exercises: 

  1. Install Python 3 and an IDE / text editor of your choice
  2. Figure out how to open it, where to run your code from, and where the output of your code will be. Usually you can figure this out by reading the website of the product, watching YouTube videos, or searching the internet.