admin管理员组文章数量:1355625
In Databricks, I ran vacuum on all the tables in loop after successful run, when i check the history for all the tables which contain operation "Vacuum start" and captured "sizeOfDataToDelete" which was closer to 1TB but when check the container metrics its still remains the size of before performing vacuum, it didn't free up the space below attached sample of the output, what could be the issue?
In Databricks, I ran vacuum on all the tables in loop after successful run, when i check the history for all the tables which contain operation "Vacuum start" and captured "sizeOfDataToDelete" which was closer to 1TB but when check the container metrics its still remains the size of before performing vacuum, it didn't free up the space below attached sample of the output, what could be the issue?
Share Improve this question asked Mar 30 at 3:06 user2703679user2703679 373 silver badges13 bronze badges1 Answer
Reset to default 2In Databricks, Vaccuum removes only the files which are not part of delta log . I am assuming you have used DELETE command to delete the data you no longer need first. If so by default deleted data will be marked as delete but not deleted till 7 days. That is the default time till which the files exists although not part of the delta log. https://docs.databricks/aws/en/sql/language-manual/delta-vacuum
VACUUM removes all files from the table directory that are not managed by Delta, as well as data files that are no longer in the latest state of the transaction log for the table and are older than a retention threshold. VACUUM will skip all directories that begin with an underscore (_), which includes the _delta_log. Partitioning your table on a column that begins with an underscore is an exception to this rule; VACUUM scans all valid partitions included in the target Delta table. Delta table data files are deleted according to the time they have been logically removed from Delta’s transaction log plus retention hours, not their modification timestamps on the storage system. The default threshold is 7 days.
本文标签:
版权声明:本文标题:apache spark - After finishing running vacuum on all tables i don't see its freeing up the space - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1743997969a2573338.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论