Hadoop Programs in Python with Pydoop

Gaston Hillar provides a good tutorial on Pydoop which is a Python API for Hadoop.

Pydoop Script enables you to write simple MapReduce programs for Hadoop with mapper and reducer functions in just a few lines of code...
Pydoop might not be the best API for all Hadoop use cases, but its unique features make it suitable for specific scenarios and it is being actively improved.


Pydoop wraps Hadoop pipes and allows you to access the most important MapReduce components, such as Partitioner, RecordReader, and RecordWriter. In addition, Pydoop makes it easy to interact with HDFS (Hadoop Distributed File System) through a Pydoop HDFS API (pydoop.hdfs), which allows you to retrieve information about directories, files, and several file system properties. The Pydoop HDFS API makes it possible to easily read and write files within HDFS by writing Python code. In addition, the lower-level API provides features similar to the Hadoop C HDFS API, and so you can use it to build statistics of HDFS usage.

source: http://www.drdobbs.com/database/pydoop-writing-hadoop-programs-in-python/240156473