Troubleshooting Test Failure In Test_migrate_external_table_hiveserde_in_place

by JurnalWarga.com 79 views
Iklan Headers
AssertionError: parquet_serde_dour not found in dummy_cgths.hiveserde_in_place_dour
assert False
[gw3] linux -- Python 3.10.18 /home/runner/work/ucx/ucx/.venv/bin/python
05:15 DEBUG [databricks.labs.ucx.framework.crawlers] [hive_metastore.dummy_slkt6.tables] fetching tables inventory
05:15 DEBUG [databricks.labs.ucx.framework.crawlers] Inventory table not found
Traceback (most recent call last):
  File "/home/runner/work/ucx/ucx/src/databricks/labs/ucx/framework/crawlers.py", line 152, in _snapshot
    cached_results = list(fetcher())
  File "/home/runner/work/ucx/ucx/src/databricks/labs/ucx/hive_metastore/tables.py", line 458, in _try_fetch
    for row in self._fetch(f"SELECT * FROM {escape_sql_identifier(self.full_name)}"):
  File "/home/runner/work/ucx/ucx/.venv/lib/python3.10/site-packages/databricks/labs/lsql/core.py", line 344, in fetch_all
    execute_response = self.execute(
  File "/home/runner/work/ucx/ucx/.venv/lib/python3.10/site-packages/databricks/labs/lsql/core.py", line 268, in execute
    self._raise_if_needed(status)
  File "/home/runner/work/ucx/ucx/.venv/lib/python3.10/site-packages/databricks/labs/lsql/core.py", line 478, in _raise_if_needed
    raise NotFound(error_message)
databricks.sdk.errors.platform.NotFound: [TABLE_OR_VIEW_NOT_FOUND] The table or view `hive_metastore`.`dummy_slkt6`.`tables` cannot be found. Verify the spelling and correctness of the schema and catalog.
If you did not qualify the name with a schema, verify the current_schema() output, or qualify the name with the correct schema and catalog.
To tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF EXISTS. SQLSTATE: 42P01; line 1 pos 14
05:15 DEBUG [databricks.labs.ucx.framework.crawlers] [hive_metastore.dummy_slkt6.tables] crawling new set of snapshot data for tables
05:15 DEBUG [databricks.labs.ucx.hive_metastore.tables] [hive_metastore.hiveserde_in_place_dour] listing tables and views
05:15 DEBUG [databricks.labs.ucx.hive_metastore.tables] [hive_metastore.hiveserde_in_place_dour.avro_serde_dour] fetching table metadata
05:15 DEBUG [databricks.labs.ucx.hive_metastore.tables] [hive_metastore.hiveserde_in_place_dour.orc_serde_dour] fetching table metadata
05:15 DEBUG [databricks.labs.ucx.hive_metastore.tables] [hive_metastore.hiveserde_in_place_dour.parquet_serde_dour] fetching table metadata
05:16 DEBUG [databricks.labs.ucx.framework.crawlers] [hive_metastore.dummy_slkt6.tables] found 3 new records for tables
05:16 DEBUG [databricks.labs.ucx.hive_metastore.locations] Replacing location dbfs:/mnt/TEST_MOUNT_NAME/a with TEST_MOUNT_CONTAINER/a in dbfs:/mnt/TEST_MOUNT_NAME/a/hiveserde_in_place_dour/parquet_serde_dour
05:16 DEBUG [databricks.labs.ucx.hive_metastore.locations] Replacing location dbfs:/mnt/TEST_MOUNT_NAME/a with TEST_MOUNT_CONTAINER/a in dbfs:/mnt/TEST_MOUNT_NAME/a/hiveserde_in_place_dour/orc_serde_dour
05:16 DEBUG [databricks.labs.ucx.hive_metastore.locations] Replacing location dbfs:/mnt/TEST_MOUNT_NAME/a with TEST_MOUNT_CONTAINER/a in dbfs:/mnt/TEST_MOUNT_NAME/a/hiveserde_in_place_dour/avro_serde_dour
05:16 DEBUG [databricks.labs.ucx.hive_metastore.table_migrate] Migrating external table hive_metastore.hiveserde_in_place_dour.parquet_serde_dour to dummy_cgths.hiveserde_in_place_dour.parquet_serde_dour using SQL query: CREATE TABLE dummy_cgths.hiveserde_in_place_dour.parquet_serde_dour (id INT, region STRING) USING PARQUET PARTITIONED BY (region) LOCATION 'TEST_MOUNT_CONTAINER/a/hiveserde_in_place_dour/parquet_serde_dour' TBLPROPERTIES ('transient_lastDdlTime'='1752902133')
05:16 DEBUG [databricks.labs.ucx.hive_metastore.table_migrate] Migrating external table hive_metastore.hiveserde_in_place_dour.avro_serde_dour to dummy_cgths.hiveserde_in_place_dour.avro_serde_dour using SQL query: CREATE TABLE dummy_cgths.hiveserde_in_place_dour.avro_serde_dour (id INT, region STRING) USING AVRO LOCATION 'TEST_MOUNT_CONTAINER/a/hiveserde_in_place_dour/avro_serde_dour' TBLPROPERTIES ('avro.schema.literal'='{
                            "namespace": "org.apache.hive",
                            "name": "first_schema",
                            "type": "record",
                            "fields": [
                                { "name":"id", "type":"int" },
                                { "name":"region", "type":"string" }
                            ] }', 'transient_lastDdlTime'='1752902134')
05:16 DEBUG [databricks.labs.ucx.hive_metastore.table_migrate] Migrating external table hive_metastore.hiveserde_in_place_dour.orc_serde_dour to dummy_cgths.hiveserde_in_place_dour.orc_serde_dour using SQL query: CREATE TABLE dummy_cgths.hiveserde_in_place_dour.orc_serde_dour (id INT, region STRING) USING ORC PARTITIONED BY (region) LOCATION 'TEST_MOUNT_CONTAINER/a/hiveserde_in_place_dour/orc_serde_dour' TBLPROPERTIES ('transient_lastDdlTime'='1752902134')
05:16 WARNING [databricks.labs.ucx.hive_metastore.table_migrate] failed-to-migrate: Failed to migrate table hive_metastore.hiveserde_in_place_dour.orc_serde_dour to dummy_cgths.hiveserde_in_place_dour.orc_serde_dour: [NO_PARENT_EXTERNAL_LOCATION_FOR_PATH] No parent external location was found for path 'TEST_MOUNT_CONTAINER/a/hiveserde_in_place_dour/orc_serde_dour'. Please create an external location on one of the parent paths and then retry the query or command again.
05:16 WARNING [databricks.labs.ucx.hive_metastore.table_migrate] failed-to-migrate: Failed to migrate table hive_metastore.hiveserde_in_place_dour.parquet_serde_dour to dummy_cgths.hiveserde_in_place_dour.parquet_serde_dour: [NO_PARENT_EXTERNAL_LOCATION_FOR_PATH] No parent external location was found for path 'TEST_MOUNT_CONTAINER/a/hiveserde_in_place_dour/parquet_serde_dour'. Please create an external location on one of the parent paths and then retry the query or command again.
05:16 WARNING [databricks.labs.ucx.hive_metastore.table_migrate] failed-to-migrate: Failed to migrate table hive_metastore.hiveserde_in_place_dour.avro_serde_dour to dummy_cgths.hiveserde_in_place_dour.avro_serde_dour: [NO_PARENT_EXTERNAL_LOCATION_FOR_PATH] No parent external location was found for path 'TEST_MOUNT_CONTAINER/a/hiveserde_in_place_dour/avro_serde_dour'. Please create an external location on one of the parent paths and then retry the query or command again.
05:15 DEBUG [databricks.labs.ucx.framework.crawlers] [hive_metastore.dummy_slkt6.tables] fetching tables inventory
05:15 DEBUG [databricks.labs.ucx.framework.crawlers] Inventory table not found
Traceback (most recent call last):
  File "/home/runner/work/ucx/ucx/src/databricks/labs/ucx/framework/crawlers.py", line 152, in _snapshot
    cached_results = list(fetcher())
  File "/home/runner/work/ucx/ucx/src/databricks/labs/ucx/hive_metastore/tables.py", line 458, in _try_fetch
    for row in self._fetch(f"SELECT * FROM {escape_sql_identifier(self.full_name)}"):
  File "/home/runner/work/ucx/ucx/.venv/lib/python3.10/site-packages/databricks/labs/lsql/core.py", line 344, in fetch_all
    execute_response = self.execute(
  File "/home/runner/work/ucx/ucx/.venv/lib/python3.10/site-packages/databricks/labs/lsql/core.py", line 268, in execute
    self._raise_if_needed(status)
  File "/home/runner/work/ucx/ucx/.venv/lib/python3.10/site-packages/databricks/labs/lsql/core.py", line 478, in _raise_if_needed
    raise NotFound(error_message)
databricks.sdk.errors.platform.NotFound: [TABLE_OR_VIEW_NOT_FOUND] The table or view `hive_metastore`.`dummy_slkt6`.`tables` cannot be found. Verify the spelling and correctness of the schema and catalog.
If you did not qualify the name with a schema, verify the current_schema() output, or qualify the name with the correct schema and catalog.
To tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF EXISTS. SQLSTATE: 42P01; line 1 pos 14
05:15 DEBUG [databricks.labs.ucx.framework.crawlers] [hive_metastore.dummy_slkt6.tables] crawling new set of snapshot data for tables
05:15 DEBUG [databricks.labs.ucx.hive_metastore.tables] [hive_metastore.hiveserde_in_place_dour] listing tables and views
05:15 DEBUG [databricks.labs.ucx.hive_metastore.tables] [hive_metastore.hiveserde_in_place_dour.avro_serde_dour] fetching table metadata
05:15 DEBUG [databricks.labs.ucx.hive_metastore.tables] [hive_metastore.hiveserde_in_place_dour.orc_serde_dour] fetching table metadata
05:15 DEBUG [databricks.labs.ucx.hive_metastore.tables] [hive_metastore.hiveserde_in_place_dour.parquet_serde_dour] fetching table metadata
05:16 DEBUG [databricks.labs.ucx.framework.crawlers] [hive_metastore.dummy_slkt6.tables] found 3 new records for tables
05:16 DEBUG [databricks.labs.ucx.hive_metastore.locations] Replacing location dbfs:/mnt/TEST_MOUNT_NAME/a with TEST_MOUNT_CONTAINER/a in dbfs:/mnt/TEST_MOUNT_NAME/a/hiveserde_in_place_dour/parquet_serde_dour
05:16 DEBUG [databricks.labs.ucx.hive_metastore.locations] Replacing location dbfs:/mnt/TEST_MOUNT_NAME/a with TEST_MOUNT_CONTAINER/a in dbfs:/mnt/TEST_MOUNT_NAME/a/hiveserde_in_place_dour/orc_serde_dour
05:16 DEBUG [databricks.labs.ucx.hive_metastore.locations] Replacing location dbfs:/mnt/TEST_MOUNT_NAME/a with TEST_MOUNT_CONTAINER/a in dbfs:/mnt/TEST_MOUNT_NAME/a/hiveserde_in_place_dour/avro_serde_dour
05:16 DEBUG [databricks.labs.ucx.hive_metastore.table_migrate] Migrating external table hive_metastore.hiveserde_in_place_dour.parquet_serde_dour to dummy_cgths.hiveserde_in_place_dour.parquet_serde_dour using SQL query: CREATE TABLE dummy_cgths.hiveserde_in_place_dour.parquet_serde_dour (id INT, region STRING) USING PARQUET PARTITIONED BY (region) LOCATION 'TEST_MOUNT_CONTAINER/a/hiveserde_in_place_dour/parquet_serde_dour' TBLPROPERTIES ('transient_lastDdlTime'='1752902133')
05:16 DEBUG [databricks.labs.ucx.hive_metastore.table_migrate] Migrating external table hive_metastore.hiveserde_in_place_dour.avro_serde_dour to dummy_cgths.hiveserde_in_place_dour.avro_serde_dour using SQL query: CREATE TABLE dummy_cgths.hiveserde_in_place_dour.avro_serde_dour (id INT, region STRING) USING AVRO LOCATION 'TEST_MOUNT_CONTAINER/a/hiveserde_in_place_dour/avro_serde_dour' TBLPROPERTIES ('avro.schema.literal'='{
                            "namespace": "org.apache.hive",
                            "name": "first_schema",
                            "type": "record",
                            "fields": [
                                { "name":"id", "type":"int" },
                                { "name":"region", "type":"string" }
                            ] }', 'transient_lastDdlTime'='1752902134')
05:16 DEBUG [databricks.labs.ucx.hive_metastore.table_migrate] Migrating external table hive_metastore.hiveserde_in_place_dour.orc_serde_dour to dummy_cgths.hiveserde_in_place_dour.orc_serde_dour using SQL query: CREATE TABLE dummy_cgths.hiveserde_in_place_dour.orc_serde_dour (id INT, region STRING) USING ORC PARTITIONED BY (region) LOCATION 'TEST_MOUNT_CONTAINER/a/hiveserde_in_place_dour/orc_serde_dour' TBLPROPERTIES ('transient_lastDdlTime'='1752902134')
05:16 WARNING [databricks.labs.ucx.hive_metastore.table_migrate] failed-to-migrate: Failed to migrate table hive_metastore.hiveserde_in_place_dour.orc_serde_dour to dummy_cgths.hiveserde_in_place_dour.orc_serde_dour: [NO_PARENT_EXTERNAL_LOCATION_FOR_PATH] No parent external location was found for path 'TEST_MOUNT_CONTAINER/a/hiveserde_in_place_dour/orc_serde_dour'. Please create an external location on one of the parent paths and then retry the query or command again.
05:16 WARNING [databricks.labs.ucx.hive_metastore.table_migrate] failed-to-migrate: Failed to migrate table hive_metastore.hiveserde_in_place_dour.parquet_serde_dour to dummy_cgths.hiveserde_in_place_dour.parquet_serde_dour: [NO_PARENT_EXTERNAL_LOCATION_FOR_PATH] No parent external location was found for path 'TEST_MOUNT_CONTAINER/a/hiveserde_in_place_dour/parquet_serde_dour'. Please create an external location on one of the parent paths and then retry the query or command again.
05:16 WARNING [databricks.labs.ucx.hive_metastore.table_migrate] failed-to-migrate: Failed to migrate table hive_metastore.hiveserde_in_place_dour.avro_serde_dour to dummy_cgths.hiveserde_in_place_dour.avro_serde_dour: [NO_PARENT_EXTERNAL_LOCATION_FOR_PATH] No parent external location was found for path 'TEST_MOUNT_CONTAINER/a/hiveserde_in_place_dour/avro_serde_dour'. Please create an external location on one of the parent paths and then retry the query or command again.
[gw3] linux -- Python 3.10.18 /home/runner/work/ucx/ucx/.venv/bin/python

Hey guys! Let's dive into this interesting test failure we've got here. It looks like the test_migrate_external_table_hiveserde_in_place test is throwing an AssertionError, specifically stating that parquet_serde_dour wasn't found in dummy_cgths.hiveserde_in_place_dour. This is definitely something we need to investigate further to ensure our table migrations are working smoothly.

Understanding the Test Failure

First off, let’s break down the error message: "AssertionError: parquet_serde_dour not found in dummy_cgths.hiveserde_in_place_dour". This tells us that the test expected the parquet_serde_dour table to be present in the dummy_cgths.hiveserde_in_place_dour location, but it wasn't there. This could point to a few potential issues:

  1. The table migration might have failed.
  2. There might be a problem with how the test is set up or how it's verifying the results.
  3. There could be an underlying issue with the crawler or migration logic itself.

Looking at the traceback, we can see a few interesting DEBUG messages that give us more context. The crawler is fetching tables inventory from hive_metastore.dummy_slkt6.tables, and it seems like the inventory table wasn't found initially. This is indicated by the "Inventory table not found" message and the subsequent databricks.sdk.errors.platform.NotFound error. This error suggests that the table or view hive_metastore.dummy_slkt6.tables might not exist, or there could be a spelling mistake or schema issue.

Diving Deeper into the Logs

After the initial failure to find the inventory table, the logs show that the crawler starts crawling a new set of snapshot data for tables. It then lists tables and views under hive_metastore.hiveserde_in_place_dour and fetches metadata for avro_serde_dour, orc_serde_dour, and parquet_serde_dour. This indicates that the crawler is at least able to see these tables in the Hive metastore.

Next, we see debug messages about replacing locations, which suggests that the migration process is trying to move the tables from one location to another. Specifically, it's replacing dbfs:/mnt/TEST_MOUNT_NAME/a with TEST_MOUNT_CONTAINER/a in the table locations. This is a crucial step in the migration process, as it ensures that the tables are pointing to the correct storage location after the migration.

The logs then show the SQL queries being used to migrate the tables. For example, the query to migrate parquet_serde_dour looks like this:

CREATE TABLE dummy_cgths.hiveserde_in_place_dour.parquet_serde_dour (id INT, region STRING) USING PARQUET PARTITIONED BY (region) LOCATION 'TEST_MOUNT_CONTAINER/a/hiveserde_in_place_dour/parquet_serde_dour' TBLPROPERTIES ('transient_lastDdlTime'='1752902133')

This query creates a new table in the dummy_cgths.hiveserde_in_place_dour location, using the Parquet format, partitioned by region, and pointing to the specified location. Similar queries are executed for avro_serde_dour and orc_serde_dour.

However, the logs also contain WARNING messages indicating that the table migrations failed due to a "NO_PARENT_EXTERNAL_LOCATION_FOR_PATH" error. This error message is a big clue! It suggests that there's no external location defined for the path TEST_MOUNT_CONTAINER/a/hiveserde_in_place_dour, or one of its parent paths. This means that the Databricks environment doesn't know where to store the data for these tables, causing the migration to fail.

Identifying the Root Cause

Based on the logs, the primary cause of the test failure seems to be the missing external location. The NO_PARENT_EXTERNAL_LOCATION_FOR_PATH error is a clear indication that the system can't find a defined storage location for the migrated tables. This can happen if:

  • The external location wasn't created before running the test.
  • The external location was created with a different path than what the migration process is expecting.
  • There's a configuration issue that prevents the system from accessing the external location.

Additionally, the initial TABLE_OR_VIEW_NOT_FOUND error for the inventory table hive_metastore.dummy_slkt6.tables might be a secondary issue. It's possible that this table is required for the test setup, and its absence could be contributing to the overall failure. We should investigate why this table is not being found.

Steps to Resolve the Test Failure

To fix this issue, we need to ensure that the external locations are properly configured before running the migration tests. Here’s a breakdown of the steps we should take:

  1. Verify External Location Configuration: The most crucial step is to check if the external location TEST_MOUNT_CONTAINER/a or one of its parent paths is properly configured in the Databricks environment. This involves checking the metastore configuration and ensuring that the external location is defined and accessible.

  2. Create External Location if Missing: If the external location doesn’t exist, we need to create it. This can be done using the Databricks CLI, the Databricks UI, or through SQL commands. For example, you might use the CREATE EXTERNAL LOCATION command in SQL.

  3. Ensure Correct Path: Double-check that the path used in the external location configuration matches the path used in the table migration queries (TEST_MOUNT_CONTAINER/a/hiveserde_in_place_dour/...). Any mismatch in paths will lead to the NO_PARENT_EXTERNAL_LOCATION_FOR_PATH error.

  4. Investigate Inventory Table Issue: We should also investigate why the hive_metastore.dummy_slkt6.tables table is not being found initially. This might involve checking the test setup scripts to ensure that this table is created before the migration tests are run. If the table is not essential for this particular test, we might consider adjusting the test to handle its absence gracefully.

  5. Retry the Test: After ensuring that the external locations are correctly configured and the inventory table issue is resolved, we should retry the test_migrate_external_table_hiveserde_in_place test to see if the failure is resolved.

  6. Add Logging and Error Handling: To prevent similar issues in the future, we should consider adding more robust logging and error handling to the table migration code. This will help us identify and diagnose problems more quickly.

Example: Creating an External Location in Databricks

If you find that the external location is missing, you can create it using a SQL command like this:

CREATE EXTERNAL LOCATION IF NOT EXISTS `test_mount_location`
URL 'TEST_MOUNT_CONTAINER/a'
WITH (STORAGE CREDENTIAL `your_storage_credential`);

Replace TEST_MOUNT_CONTAINER/a with the actual path and your_storage_credential with the name of your storage credential. This command tells Databricks where to store the data for the external tables.

Conclusion: Fixing the Migration Test Failure

In summary, the test_migrate_external_table_hiveserde_in_place test is failing because the required external location is not properly configured in the Databricks environment. The NO_PARENT_EXTERNAL_LOCATION_FOR_PATH error is the key indicator here. By verifying and creating the external location, ensuring the correct path, and investigating the inventory table issue, we can resolve this failure and ensure our table migration process is working as expected.

So, let's get those external locations set up, and we should be good to go! Happy debugging, folks!