admin管理员组文章数量:1345082
I am developing a lambda architecture where we in the batch layer we send the events to s3->EMR->ETL->s3 and the idea is to then serve it through redshift, and in the other side the speed layer receive the same events through lambda->dynamoDB.
In the end we need to merge the data from both layers, and in our case we will need to do some bunch of where clauses based in the aggregation of both ends, we first thought of using athena to do the queries and read from the s3 and dynamo and merge then. However it is taking a while to process the data from both sides and return a result.
So, since the real time isnt supposed to have great differences for the last hour(which is the timeframe we are using between both layers), we thought of joining the data at application level, but I couldnt find any reference out there at doing it..
Do you know any libraries or good practice to doing so at application level? Do you have any other suggestion that may be easier or more efficient to merge this two layers?
本文标签: amazon s3Aggregate data from Real Time and ETL(S3Dynamo)Stack Overflow
版权声明:本文标题:amazon s3 - Aggregate data from Real Time and ETL(S3+Dynamo) - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1743776844a2537136.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论