admin管理员组

文章数量:1355611

I'm executing multiple workflows and each workflow with multiple parameters and would like to postprocess these results by adding them to a database that is then queried (e.g. a knowledgegraph or a mongo_db). I want to answer the following questions

  1. What were the parameters that the workflow my_workflow1 was ever executed with?
  2. What was y in function_two (as task in the workflow) when my_workflow2 was executed with parameters (1,2)
  3. Assume we have a task that takes as input a file_name, what was the content of that file when my_workflow was executed with parameters (1,2)
  4. What was the version number/conda enviroment/git_commit for the task function_two in the workflow execution of my_workflow that resulted in output=1?
  5. Which workflows used in their execution a docker container "xyz" for a specific task (or in a similar setup a scipy version < 1.15)?
  6. How long did the execution of the workflow my_workflow (1,2) took, what were the most expensive execution steps?
  7. What was the architecture of the node this most expensive computation step was execute on (e.g. number of cores, memory)

For that purpose, I want to generate a report, but in json (or similar) that allows me to keep track of all the information that was generated during the execution of the workflow in a machine readable way, potentially with a schema/ontology. Is there an option to use the report feature, but writing a machine readable file rather than a html report?

I'm executing multiple workflows and each workflow with multiple parameters and would like to postprocess these results by adding them to a database that is then queried (e.g. a knowledgegraph or a mongo_db). I want to answer the following questions

  1. What were the parameters that the workflow my_workflow1 was ever executed with?
  2. What was y in function_two (as task in the workflow) when my_workflow2 was executed with parameters (1,2)
  3. Assume we have a task that takes as input a file_name, what was the content of that file when my_workflow was executed with parameters (1,2)
  4. What was the version number/conda enviroment/git_commit for the task function_two in the workflow execution of my_workflow that resulted in output=1?
  5. Which workflows used in their execution a docker container "xyz" for a specific task (or in a similar setup a scipy version < 1.15)?
  6. How long did the execution of the workflow my_workflow (1,2) took, what were the most expensive execution steps?
  7. What was the architecture of the node this most expensive computation step was execute on (e.g. number of cores, memory)

For that purpose, I want to generate a report, but in json (or similar) that allows me to keep track of all the information that was generated during the execution of the workflow in a machine readable way, potentially with a schema/ontology. Is there an option to use the report feature, but writing a machine readable file rather than a html report?

Share Improve this question asked Mar 31 at 4:46 FerdiFerdi 91 bronze badge
Add a comment  | 

1 Answer 1

Reset to default 0

For this level of detail you'll need to write a custom reporter plug-in. This will have the ability to introspect the whole workflow and you can also examine input/output files themselves (needed for 3) which a generic plugin is not going to do.

You could then export all the info as JSON, or you could have your plugin update your database directly.

See:

https://github/snakemake/snakemake-interface-report-plugins/tree/main

And here is an example of a plugin:

https://github/UoMResearchIT/ro-crate_snakemake_tooling/tree/develop/snakemake-report-plugin-wrroc/snakemake_report_plugin_wrroc

Note - when making a test plugin I found that using poetry, as suggested here, was more of a hindrance than a help, but YMMV.

本文标签: jsonsnakemake with machine readable reportsStack Overflow