admin管理员组

文章数量:1123646

I have got 2 components in Azure Machine Learning. I have got 2 dataframes in the first component (called prep) which I want to pass into the next component (called middle) for further processing.

In the prep code, I have tried to save the dataframe into the component's output section, into a datastore and into the args location passed in as input parameters. As shown below:

print((Path(args.Y_df) / "Y_df.csv"))
df1.to_csv("./outputs/Y_df.csv")
 df1.to_csv(args.Y_df.path)
 df1.to_csv("azureml://subscriptions/subscription_id/resourcegroups/rg_group/workspaces/workspace_name/datastores/datastore_name/paths/azureml/forecast/testing/y_df.csv")

Out of these only the first method works. Now I want to pass this into the next component. So in the pipeline definition code, I have mentioned this:

def data_pipeline(
    compute_train_node: str,
):

    prep_node = prep()
    transform_node = middle(Y_df=prep_node.outputs.Y_df,
                            S_df=prep_node.outputs.S_df)

I am trying to run a basic code in the middle component but it just does not get started. It fails with the following error:

Below are YAMLS for prep and middle: middle:

name: middle4 display_name: middle4

inputs:   Y_df:
    type: uri_file   S_df:
    type: uri_file

code: ./middle

environment: azureml:environment_name:4

command: >-   python middle_script.py   --Y_df ${{inputs.Y_df}}   
--S_df ${{inputs.S_df}}

prep:

name: preprocessing24
display_name: preprocessing24

outputs:
  Y_df:
    type: uri_file

  S_df:
    type: uri_file

code: ./preprocessing

environment: azureml:environment_name:4

command: >-
  python preprocessing_script.py
  --Y_df ${{outputs.Y_df}} 
  --S_df ${{outputs.S_df}}

What am I doing wrong? How do I pass file from one component to the other?

Edit after trying out the method in the answer:

As of now, args.Y_df points to some random (probably default) file path instead of the one I have given it as part of the Output() function as mentioned in the answer. It then gives an error saying

OSError: Cannot save file into a non-existent directory: '/mnt/azureml/cr/j/32h438dshj537dj284ndhs630e1/cap/data-capability/wd/Y_df/testing'

Below is the code I have written for getting the path into the prep code. This path is used to save the dataframes as csv.

parser = argparse.ArgumentParser("prep")
parser.add_argument("--Y_df", type=str, help="Path of prepped data")
parser.add_argument("--S_df", type=str, help="Path of prepped data")
parser.add_argument("--clinical_actuals_path", type=str, help="Path of prepped data")
args = parser.parse_args()

I have got 2 components in Azure Machine Learning. I have got 2 dataframes in the first component (called prep) which I want to pass into the next component (called middle) for further processing.

In the prep code, I have tried to save the dataframe into the component's output section, into a datastore and into the args location passed in as input parameters. As shown below:

print((Path(args.Y_df) / "Y_df.csv"))
df1.to_csv("./outputs/Y_df.csv")
 df1.to_csv(args.Y_df.path)
 df1.to_csv("azureml://subscriptions/subscription_id/resourcegroups/rg_group/workspaces/workspace_name/datastores/datastore_name/paths/azureml/forecast/testing/y_df.csv")

Out of these only the first method works. Now I want to pass this into the next component. So in the pipeline definition code, I have mentioned this:

def data_pipeline(
    compute_train_node: str,
):

    prep_node = prep()
    transform_node = middle(Y_df=prep_node.outputs.Y_df,
                            S_df=prep_node.outputs.S_df)

I am trying to run a basic code in the middle component but it just does not get started. It fails with the following error:

Below are YAMLS for prep and middle: middle:

name: middle4 display_name: middle4

inputs:   Y_df:
    type: uri_file   S_df:
    type: uri_file

code: ./middle

environment: azureml:environment_name:4

command: >-   python middle_script.py   --Y_df ${{inputs.Y_df}}   
--S_df ${{inputs.S_df}}

prep:

name: preprocessing24
display_name: preprocessing24

outputs:
  Y_df:
    type: uri_file

  S_df:
    type: uri_file

code: ./preprocessing

environment: azureml:environment_name:4

command: >-
  python preprocessing_script.py
  --Y_df ${{outputs.Y_df}} 
  --S_df ${{outputs.S_df}}

What am I doing wrong? How do I pass file from one component to the other?

Edit after trying out the method in the answer:

As of now, args.Y_df points to some random (probably default) file path instead of the one I have given it as part of the Output() function as mentioned in the answer. It then gives an error saying

OSError: Cannot save file into a non-existent directory: '/mnt/azureml/cr/j/32h438dshj537dj284ndhs630e1/cap/data-capability/wd/Y_df/testing'

Below is the code I have written for getting the path into the prep code. This path is used to save the dataframes as csv.

parser = argparse.ArgumentParser("prep")
parser.add_argument("--Y_df", type=str, help="Path of prepped data")
parser.add_argument("--S_df", type=str, help="Path of prepped data")
parser.add_argument("--clinical_actuals_path", type=str, help="Path of prepped data")
args = parser.parse_args()
Share Improve this question edited 17 hours ago Ameya Bhave asked 21 hours ago Ameya BhaveAmeya Bhave 1071 gold badge1 silver badge14 bronze badges 6
  • 1 You need to configure output path in prep_node. like prep_node.outputs.Y_df= "path" before passing into further component. – JayashankarGS Commented 21 hours ago
  • Is that done in the YAML or in the prep() definition in python? Because i did prep_node = prep( Y_df = "./outputs/Y_df.csv", S_df = "./outputs/S_df.csv" ) And this is giving me an error: UnexpectedKeywordError: [component] preprocessing24() got an unexpected keyword argument 'Y_df'. – Ameya Bhave Commented 20 hours ago
  • don't pass it as parameter. after prep_node= prep() add prep_node.outputs.Y_df= "path1" , prep_node.outputs.S_df= "path2" and pass further transform_node = middle(Y_df=prep_node.outputs.Y_df, S_df=prep_node.outputs.S_df) – JayashankarGS Commented 20 hours ago
  • try giving datastore path instead local ./ – JayashankarGS Commented 20 hours ago
  • It fails at middle(Y_df=prep_node.outputs.Y_df, S_df=prep_node.outputs.S_df) With an error saying AttributeError: 'str' object has no attribute '_mode' – Ameya Bhave Commented 20 hours ago
 |  Show 1 more comment

1 Answer 1

Reset to default 0

You have to give datastore path to the output of prep_node like below.

from  azure.ai.ml  import  MLClient, Input, Output

def data_pipeline(
    compute_train_node: str,
):

    prep_node = prep()
    
    prep_node.outputs.Y_df= Output(type="uri_folder", path="azureml://datastores/<datastore_name>/paths/csvs/Y_df/")
    prep_node.outputs.S_df= Output(type="uri_folder", path="azureml://datastores/<datastore_name>/paths/csvs/S_df/")
    
    transform_node = middle(Y_df=prep_node.outputs.Y_df,
                            S_df=prep_node.outputs.S_df)

Here, i am giving Output object with datastore path to Y_df andS_df.

Next, save csv files in prep component like below.

df1.to_csv(Path(args.Y_df) / "Y_df.csv")

df2.to_csv(Path(args.S_df) / "S_df.csv")

If you want to save 2 files in single folder giving single output to prep component and access them with that folder in next component.

本文标签: Move data from one component to the next in Azure Machine LearningStack Overflow