You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to migrate Glue Catalog to Hive Metastore of an EMR Cluster ( I used an external MySQL database as my Hive metastore).
I followed all the steps to migrate directly from AWS Glue to Hive, but i experienced " 'str' object has no attribute '_jdf' "when i run the Glue ETL job. See the full error message below:
2021-11-11 09:33:53,573 ERROR [main] glue.ProcessLauncher (Logging.scala:logError(73)): Error from Python:Traceback (most recent call last):
File "/tmp/export_from_datacatalog.py", line 138, in
main()
File "/tmp/export_from_datacatalog.py", line 134, in main
connection=glue_context.extract_jdbc_conf(connection_name)
File "/tmp/export_from_datacatalog.py", line 38, in datacatalog_migrate_to_hive_metastore
transform_databases_tables_partitions(sc, sql_context, hive_metastore, databases, tables, partitions)
File "/tmp/localPyFiles-3222c3b6-ae99-42e0-be66-ac44ed10e9ab/hive_metastore_migration.py", line 1445, in transform_databases_tables_partitions
.transform(hms=hive_metastore, databases=databases, tables=tables, partitions=partitions)
File "/tmp/localPyFiles-3222c3b6-ae99-42e0-be66-ac44ed10e9ab/hive_metastore_migration.py", line 1227, in transform
(ms_sds, ms_tbls, ms_partitions) = self.extract_sds(ms_tbls, ms_partitions)
File "/tmp/localPyFiles-3222c3b6-ae99-42e0-be66-ac44ed10e9ab/hive_metastore_migration.py", line 1018, in extract_sds
.drop_columns(['ID', 'type'])
File "/tmp/localPyFiles-3222c3b6-ae99-42e0-be66-ac44ed10e9ab/hive_metastore_migration.py", line 182, in drop_columns
df = df.drop(col)
File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 2519, in drop
jdf = self._jdf.drop(self._jseq(cols))
AttributeError: 'str' object has no attribute '_jdf'
The text was updated successfully, but these errors were encountered:
Hi @hlmiao,
i'm trying to do the opposite of what are you doing. I'm actually try to find the bug about this error and found that the problem is the bind of methods like drop_columns to the class DataFrame. This bindind is not working as expected, i modify the script removing these bindings and the script goes over.
Actually i still have bugs on script but hope this workaround can fix your problem.
I am trying to migrate Glue Catalog to Hive Metastore of an EMR Cluster ( I used an external MySQL database as my Hive metastore).
I followed all the steps to migrate directly from AWS Glue to Hive, but i experienced " 'str' object has no attribute '_jdf' "when i run the Glue ETL job. See the full error message below:
2021-11-11 09:33:53,573 ERROR [main] glue.ProcessLauncher (Logging.scala:logError(73)): Error from Python:Traceback (most recent call last):
File "/tmp/export_from_datacatalog.py", line 138, in
main()
File "/tmp/export_from_datacatalog.py", line 134, in main
connection=glue_context.extract_jdbc_conf(connection_name)
File "/tmp/export_from_datacatalog.py", line 38, in datacatalog_migrate_to_hive_metastore
transform_databases_tables_partitions(sc, sql_context, hive_metastore, databases, tables, partitions)
File "/tmp/localPyFiles-3222c3b6-ae99-42e0-be66-ac44ed10e9ab/hive_metastore_migration.py", line 1445, in transform_databases_tables_partitions
.transform(hms=hive_metastore, databases=databases, tables=tables, partitions=partitions)
File "/tmp/localPyFiles-3222c3b6-ae99-42e0-be66-ac44ed10e9ab/hive_metastore_migration.py", line 1227, in transform
(ms_sds, ms_tbls, ms_partitions) = self.extract_sds(ms_tbls, ms_partitions)
File "/tmp/localPyFiles-3222c3b6-ae99-42e0-be66-ac44ed10e9ab/hive_metastore_migration.py", line 1018, in extract_sds
.drop_columns(['ID', 'type'])
File "/tmp/localPyFiles-3222c3b6-ae99-42e0-be66-ac44ed10e9ab/hive_metastore_migration.py", line 182, in drop_columns
df = df.drop(col)
File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 2519, in drop
jdf = self._jdf.drop(self._jseq(cols))
AttributeError: 'str' object has no attribute '_jdf'
The text was updated successfully, but these errors were encountered: