admin管理员组文章数量:1355611
I'm executing multiple workflows and each workflow with multiple parameters and would like to postprocess these results by adding them to a database that is then queried (e.g. a knowledgegraph or a mongo_db). I want to answer the following questions
- What were the parameters that the workflow my_workflow1 was ever executed with?
- What was y in function_two (as task in the workflow) when my_workflow2 was executed with parameters (1,2)
- Assume we have a task that takes as input a file_name, what was the content of that file when my_workflow was executed with parameters (1,2)
- What was the version number/conda enviroment/git_commit for the task function_two in the workflow execution of my_workflow that resulted in output=1?
- Which workflows used in their execution a docker container "xyz" for a specific task (or in a similar setup a scipy version < 1.15)?
- How long did the execution of the workflow my_workflow (1,2) took, what were the most expensive execution steps?
- What was the architecture of the node this most expensive computation step was execute on (e.g. number of cores, memory)
For that purpose, I want to generate a report, but in json (or similar) that allows me to keep track of all the information that was generated during the execution of the workflow in a machine readable way, potentially with a schema/ontology. Is there an option to use the report feature, but writing a machine readable file rather than a html report?
I'm executing multiple workflows and each workflow with multiple parameters and would like to postprocess these results by adding them to a database that is then queried (e.g. a knowledgegraph or a mongo_db). I want to answer the following questions
- What were the parameters that the workflow my_workflow1 was ever executed with?
- What was y in function_two (as task in the workflow) when my_workflow2 was executed with parameters (1,2)
- Assume we have a task that takes as input a file_name, what was the content of that file when my_workflow was executed with parameters (1,2)
- What was the version number/conda enviroment/git_commit for the task function_two in the workflow execution of my_workflow that resulted in output=1?
- Which workflows used in their execution a docker container "xyz" for a specific task (or in a similar setup a scipy version < 1.15)?
- How long did the execution of the workflow my_workflow (1,2) took, what were the most expensive execution steps?
- What was the architecture of the node this most expensive computation step was execute on (e.g. number of cores, memory)
For that purpose, I want to generate a report, but in json (or similar) that allows me to keep track of all the information that was generated during the execution of the workflow in a machine readable way, potentially with a schema/ontology. Is there an option to use the report feature, but writing a machine readable file rather than a html report?
Share Improve this question asked Mar 31 at 4:46 FerdiFerdi 91 bronze badge1 Answer
Reset to default 0For this level of detail you'll need to write a custom reporter plug-in. This will have the ability to introspect the whole workflow and you can also examine input/output files themselves (needed for 3) which a generic plugin is not going to do.
You could then export all the info as JSON, or you could have your plugin update your database directly.
See:
https://github/snakemake/snakemake-interface-report-plugins/tree/main
And here is an example of a plugin:
https://github/UoMResearchIT/ro-crate_snakemake_tooling/tree/develop/snakemake-report-plugin-wrroc/snakemake_report_plugin_wrroc
Note - when making a test plugin I found that using poetry, as suggested here, was more of a hindrance than a help, but YMMV.
本文标签: jsonsnakemake with machine readable reportsStack Overflow
版权声明:本文标题:json - snakemake with machine readable reports - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1743967169a2570100.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论