Lesson 2: Hadoop HDFS basics

Learning Objectives:

  • Navigate and interact with HDFS file system

Lesson Overview:

In this lesson, students will learn about the basics of Hadoop Distributed File System (HDFS) and how to navigate through it.

Key Concepts:

  • HDFS commands for navigation

By following this comprehensive lesson plan, students will gain a solid understanding of Hadoop HDFS basics and be able to effectively navigate through the HDFS file system.

Concepts

Concept 2.1: Understanding HDFS Path Structure

HDFS (Hadoop Distributed File System) follows a hierarchical directory structure similar to traditional file systems. Each file or directory in HDFS is identified by a unique path that starts from the root directory "/". Paths in HDFS are represented using the Unix-like notation with forward slashes ("/").

  • Root Directory: The root directory in HDFS is denoted by "/". All other directories and files are contained within this root directory.
  • Absolute Path: An absolute path in HDFS specifies the complete path starting from the root directory. For example, "/user/hadoop/input/file.txt" is an absolute path.

Understanding the structure of paths in HDFS is essential for navigating through the file system and performing various file operations.

Code Sample

# List contents of the root directory in HDFS
hadoop fs -ls /

# Create a new directory in HDFS
hadoop fs -mkdir /user/hadoop/data

# Remove (delete) a directory in HDFS
hadoop fs -rmdir /user/documents/files

# Transfer a file from Linux to HDFS
hadoop fs -put /user/documents/myfile.csv

# Read the contents of a file in HDFS
hadoop fs -cat /user/documents/myfile.csv