Lesson 2: Hadoop HDFS basics
Learning Objectives:
- Navigate and interact with HDFS file system
Lesson Overview:
In this lesson, students will learn about the basics of Hadoop Distributed File System (HDFS) and how to navigate through it.
Key Concepts:
- HDFS commands for navigation
By following this comprehensive lesson plan, students will gain a solid understanding of Hadoop HDFS basics and be able to effectively navigate through the HDFS file system.
Concepts
Concept 2.1: Understanding HDFS Path Structure
HDFS (Hadoop Distributed File System) follows a hierarchical directory structure similar to traditional file systems. Each file or directory in HDFS is identified by a unique path that starts from the root directory "/". Paths in HDFS are represented using the Unix-like notation with forward slashes ("/").
- Root Directory: The root directory in HDFS is denoted by "/". All other directories and files are contained within this root directory.
- Absolute Path: An absolute path in HDFS specifies the complete path starting from the root directory. For example, "/user/hadoop/input/file.txt" is an absolute path.
Understanding the structure of paths in HDFS is essential for navigating through the file system and performing various file operations.
Code Sample
# List contents of the root directory in HDFS
hadoop fs -ls /
# Create a new directory in HDFS
hadoop fs -mkdir /user/hadoop/data
# Remove (delete) a directory in HDFS
hadoop fs -rmdir /user/documents/files
# Transfer a file from Linux to HDFS
hadoop fs -put /user/documents/myfile.csv
# Read the contents of a file in HDFS
hadoop fs -cat /user/documents/myfile.csv