triggers - Should i use synapse, databrick or ADF for simple data transformation ? The crux of my problem is how the tool i trig

IT技术

更新时间：2025-04-166

admin管理员组
文章数量:1391960

I have a data lake where each container consists of folders named A, B, C, etc. Inside each folder, there are JSON files that I want to merge into one Parquet file, with some simple data transformations. Hence, there is one Parquet file per folder in the container. The data transformation for folder A is different from that for folders B and C, but all transformations result in a Parquet file.

As time goes on, new containers will be added with the same A, B, C folder structure and names, and JSON files will be updated.

I am considering using Synapse, Databricks, and Azure Data Factory (ADF) for this task. However, the challenging part is creating a trigger that updates the Parquet file when needed. The entire process should be automated, with the data transformation tool monitoring the data lake and deciding when to start the data transformation.

How can I automate this process to ensure that the Parquet files are updated whenever new data is added or existing data is modified? Any guidance on setting up triggers and automating the data transformation would be greatly appreciated.

Could ADF and azure function be a option as well ? cost is something i would consider as important :D

Im new to all these tools :D

As time goes on, new containers will be added with the same A, B, C folder structure and names, and JSON files will be updated.

Could ADF and azure function be a option as well ? cost is something i would consider as important :D

Im new to all these tools :D

Share Improve this question asked Mar 12 at 13:07 Mats-Johan Fagerheim 12 bronze badges

can you provide the code you have tried? – Dileep Raj Narayan Thumula Commented Mar 13 at 3:30

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

If you have restrictions to only ADF, you can try the below approach using two pipelines, 6 dataflows and one trigger.

This design involves two scenarios:

Full load of the files:

For the first time, the files need to be merged parquet files should be created.

For this, you need to 3 dataflows which will be used for 3 folders.

Follow the below pipeline design:

Get Meta Data Activity -> Take a Binary dataset without any container or file path and use Child items in the activity.
For-Each activity -> Give the child items list to for-each activity and enable the Sequential checkbox.
    - Dataflow 1 -> Create a JSON dataset with dataset parameter in the container name and give it as source of the dataflow1. Pass the container name from the for loop to dataset parameter in the activity. Inside dataflow source settings give a wild card file path A/*.json and add your transformations after the source. In the sink give your parquet dataset. Use dataset parameters for the file name of the parquet if needed.
    - Dataflow 2 -> Similarly, do the same for folder B and add folder B transformations in the wild card file path, give B/*.json.
    - Dataflow 3 -> Similarly, do the same for folder C and add folder C transformations in the wild card file path, give C/*.json.

For Incremental load:

After the first load, use the below pipeline design.

Create parameters for the folder_path and file path in the pipeline and pass the trigger parameter for it. Now, create Storage event trigger with all containers and give .json as file end path.

Refer this SO answer to know about passing trigger parameters to pipeline parameters.

Create another set of 3 dataflows for the incremental load.

Take a Parquet dataset with dataset parameter for the container and give your folder A parquet file name. Use this as both Source1 and sink1. Create another JSON dataset with dataset parameters for both container and file name. Use this dataset as Source2 in the dataflow. Use union transformation byName to both Source1 and Source2. Add the same Source1 dataset as sink in the dataflow.

Then follow the below pipeline design.

Set variable -> Extract the folder name and store it in a string variable `folder` using this expression from the pipeline parameter split(pipeline().parameters.folder_path,'/')[1].
Switch activity -> Give the `folder` variable to this
 - Case `A` -> Give the case name as `A`.
    - Dataflow A -> pass the folder name and file name parameters to the dataflow dataset parameters.
 - Case `B` -> Give the case name as `B`.
    - Dataflow B -> pass the folder name and file name parameters to the dataflow dataset parameters.
 - Case `C` -> Give the case name as `C`.
    - Dataflow C -> pass the folder name and file name parameters to the dataflow dataset parameters.

For every file upload in any of your containers, the above pipeline will extract the folder name of it and will append the file data to the parquet file of the same folder.

As you have mentioned that the transformations are simple, so, you can try the above approach.

If the transformations are complex, then having a Databricks or synapse notebook for the transformations is better approach. You can use the ADF trigger for the file upload/modification and call the databricks or synapse notebook to pass the file path.

本文标签：

版权声明：本文标题：triggers - Should i use synapse, databrick or ADF for simple data transformation ? The crux of my problem is how the tool i trig 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1744750494a2623146.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

发表评论

全部评论 0

暂无评论

编程频道|软件玩家 - 软件改变生活！

triggers - Should i use synapse, databrick or ADF for simple data transformation ? The crux of my problem is how the tool i trig

1 Answer 1

更多相关文章

How do I display the clientOriginalFileName in a TextInput before form submission in Laravel? - Stack Overflow

vue.js - How to use Vuetify in Vue SFC Playground? - Stack Overflow

html - Get the state of a CSS animation in JavaScript and print it inside an element or on the console so as to later manipulate

javascript - Suppress Omniture s.t() call from generating a pageview - Stack Overflow

jquery - How to get array from JSON format in Javascript? - Stack Overflow

javascript - Kendo data source after page load - Stack Overflow

javascript - FullCalendar: Changing event start date and time format - Stack Overflow

How to dynamically change the text content of svg in html with javascript - Stack Overflow

How to Stop the AI Feature &quot;XX% Applying Edits&quot; in Visual Studio Code? - Stack Overflow

javascript - Best way to access Apollo GraphQL client inside redux action creators? - Stack Overflow

javascript - Display a loader when the page loads - Stack Overflow

javascript - set client-secret in keyclok-Angular - Stack Overflow

html - how to build a load more &lt;li&gt; function using pure javascript instead of jquery - Stack Overflow

plugins - Wordpress - WPBakery - Near Footer jump issue

javascript - Amcharts V4 How to change data without reloading of whole chart? - Stack Overflow

node.js - Uncaught TypeError: Cannot read properties of undefined (reading &#39;isTTY&#39;) - Stack Overflow

Switching primary site in WordPress Multisite

dom - javascript check if object implements HTMLAnchorElement interface - Stack Overflow

Block editor: content resets to the previous state in &quot;Edit as HTML&quot; by clicking somewhere outside of the edit

block editor - Gutenberg: Sidebar for specific post type

发表评论

推荐文章

javascript - Telegram requestWriteAccess popup window - Stack Overflow

hacked - Increased CPU load due to admin-ajax.php spam

iOS Build Fails with &#39;RNWorkletsSpec.h&#39; file not found in Xcode (React Native 0.78 New Arch) - Stack Overflow

javascript - Mysql Schema in Node.js? - Stack Overflow

Spring boot 3.4+ webclient reactive depracated metrics filter alternative - Stack Overflow

热门文章

I want to Google login on iOS devices in Flutter - Stack Overflow

Change woocommerce one category image size

asp.net core - Custom attributes migration from .NET Framework to .NET 8 - Stack Overflow

wp autop - New method to disable wpautop after WP 4.3?

javascript - Google maps rectanglepolygon with title - Stack Overflow

chart.js - Migration to Angular v17 with ng2 charts v2 - Stack Overflow

javascript - What is difference on ko.computed between deferEvaluation and extend({deferred: true}) - Stack Overflow

javascript - Execute mouseover() from JS without change position of mouse - Stack Overflow

sql server - Get value from first row to another based on different primary key values - Stack Overflow

uri - Save javascript data to Excel - Stack Overflow

最新文章

windows设置断电重启开机后自动输入锁屏密码登录

Windows系统设置开机默认开启数字小键盘

Windows11 开机自动同步时间（开机时间不更新问题）

windows配置开机自启动软件或脚本

【Redis】Windows设置Redis为开机自启动

block editor - Gutenberg: Sidebar for specific post type

javascript - Json_encode umlauts output - Stack Overflow

javascript - How to reduce XMLHttpRequest time lag? - Stack Overflow

javascript - jQuery $.get() function succeeds with 200 but returns no content in Firefox - Stack Overflow

Block editor: content resets to the previous state in &quot;Edit as HTML&quot; by clicking somewhere outside of the edit

惠普OMEN 15-CE001TX 2EF91PA参数报价

苹果新款MacBook Pro 15英寸 i732GB1TBVega Pro 20参数报价

联想Y330A-PSE L参数报价

神舟战神Z7 D6 i7-12650H16GB512GBRTX4050旗舰版参数报价

神舟战神Z7 D6 i7-12650H16GB1TBRTX4050参数报价

How to Stop the AI Feature "XX% Applying Edits" in Visual Studio Code? - Stack Overflow

html - how to build a load more <li> function using pure javascript instead of jquery - Stack Overflow

node.js - Uncaught TypeError: Cannot read properties of undefined (reading 'isTTY') - Stack Overflow

Block editor: content resets to the previous state in "Edit as HTML" by clicking somewhere outside of the edit

iOS Build Fails with 'RNWorkletsSpec.h' file not found in Xcode (React Native 0.78 New Arch) - Stack Overflow

Block editor: content resets to the previous state in "Edit as HTML" by clicking somewhere outside of the edit