admin管理员组

文章数量:1123223

I have two unbounded sources (pubsub):

  • main source: emits values frequently
  • secondary source: sends an event which tells us to read a big query table, since there was a change in the table.

I want to enrich (left join) the main source with the table read based on the secondary source.

I already have a solution in which the big query tables are read at the beginning, thus they are bounded. For the join I used Beam SQL, since it is quite complex and I want to keep it, therefore, I think using side input is not feasible, since I don't think I can join a PCollection with PCollectionView using Beam SQL.

I tried to use a fixed window with 5 seconds on each source, but for the second source the last state is not propagated to the windows where nothing has changed. Therefore after joining the sources I get the right results only when the BigQuery table was updated, but when nothing has changed (most of the time) I get null values on the right side.

How can I upsample the seconds source to get the right results after the join?

本文标签: javaJoin a rapidly and slowly changing unbounded sources in Apache BeamStack Overflow