admin管理员组

文章数量:1122846

The documentation of the Hadoop Job API gives as example:

From .3.5/api/org/apache/hadoop/mapreduce/Job.html

Here is an example on how to submit a job:

     // Create a new Job
     Job job = Job.getInstance();
     job.setJarByClass(MyJob.class);
     
     // Specify various job-specific parameters     
     job.setJobName("myjob");
     
     job.setInputPath(new Path("in"));
     job.setOutputPath(new Path("out"));
     
     job.setMapperClass(MyJob.MyMapper.class);
     job.setReducerClass(MyJob.MyReducer.class);

     // Submit the job, then poll for progress until the job is complete
     job.waitForCompletion(true);

This seems natural to attach the input and output paths to the job.

However, the setInputPath()and setOutputPath() methods do not exist in the API (see the rest of the documentation page).

And this example actually does not compile.

The correct way to set it is to replace those two instructions with:

FileInputFormat.addInputPath(job, new Path("in"));
FileOutputFormat.setOutputPath(job, new Path("out"));

The use of these methods (doc here [1]), seems pretty inelegant to me as I can't see why it cannot be a function of the Job (as the example in the doc intended to show).

My question is: can an expert

  • give an explanation of the rationale,
  • confirm the documentation is wrong.

[1] .3.5/api/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.html

The documentation of the Hadoop Job API gives as example:

From https://hadoop.apache.org/docs/r3.3.5/api/org/apache/hadoop/mapreduce/Job.html

Here is an example on how to submit a job:

     // Create a new Job
     Job job = Job.getInstance();
     job.setJarByClass(MyJob.class);
     
     // Specify various job-specific parameters     
     job.setJobName("myjob");
     
     job.setInputPath(new Path("in"));
     job.setOutputPath(new Path("out"));
     
     job.setMapperClass(MyJob.MyMapper.class);
     job.setReducerClass(MyJob.MyReducer.class);

     // Submit the job, then poll for progress until the job is complete
     job.waitForCompletion(true);

This seems natural to attach the input and output paths to the job.

However, the setInputPath()and setOutputPath() methods do not exist in the API (see the rest of the documentation page).

And this example actually does not compile.

The correct way to set it is to replace those two instructions with:

FileInputFormat.addInputPath(job, new Path("in"));
FileOutputFormat.setOutputPath(job, new Path("out"));

The use of these methods (doc here [1]), seems pretty inelegant to me as I can't see why it cannot be a function of the Job (as the example in the doc intended to show).

My question is: can an expert

  • give an explanation of the rationale,
  • confirm the documentation is wrong.

[1] https://hadoop.apache.org/docs/r3.3.5/api/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.html

Share Improve this question edited Nov 23, 2024 at 15:45 OneCricketeer 191k20 gold badges141 silver badges267 bronze badges asked Nov 22, 2024 at 10:51 user1551605user1551605 1931 silver badge10 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 0

My "non expert" opinion - if the code compiles, it is not wrong. If it does not, then it's worthy of a JIRA ticket to the Apache Hadoop project (maybe one already exists).

Just because Javadoc is missing (somehow), doesn't necessarily mean it's wrong. Open source documentation is hard, and code comments are not ( regularly) checked for compilation errors...

In any case, I don't know a single person that explicitly writes mapreduce API code anymore without a higher level abstraction, so what's the use case you're trying to solve?

本文标签: javaIs the Hadoop documentation wrong for setStack Overflow