-
Notifications
You must be signed in to change notification settings - Fork 4.7k
HIVE-15984: Allow ALTER TABLE DROP COLUMN #5715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request adds support for the ALTER TABLE DROP COLUMN command in Hive. Key changes include the introduction of a new descriptor and analyzer for the drop column syntax, updates to the Iceberg meta hook to handle drop column operations, and adjustments in privilege and error messaging to facilitate the new functionality.
Reviewed Changes
Copilot reviewed 14 out of 23 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/column/drop/AlterTableDropColumnDesc.java | New descriptor for DROP COLUMN operations |
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java | Updated to support the drop column operation and enforce single-column drop semantics |
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/column/drop/AlterTableDropColumnAnalyser.java | Analyzer implementation for the DROP COLUMN statement |
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/column/drop/AlterTableDropColumnOperation.java | Implementation of the drop column alteration including SerDe handling and column removal logic |
common/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java | New error messages for drop column operations |
Others | Updates in enums, privileges, and test cases to accommodate the new drop column behavior |
Files not reviewed (9)
- iceberg/iceberg-handler/src/test/queries/positive/iceberg_drop_column.q: Language not supported
- iceberg/iceberg-handler/src/test/results/positive/iceberg_drop_column.q.out: Language not supported
- parser/src/java/org/apache/hadoop/hive/ql/parse/AlterClauseParser.g: Language not supported
- parser/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g: Language not supported
- ql/src/test/queries/clientnegative/orc_drop_column.q: Language not supported
- ql/src/test/queries/clientnegative/parquet_drop_column.q: Language not supported
- ql/src/test/queries/clientpositive/orc_drop_column.q: Language not supported
- ql/src/test/queries/clientpositive/parquet_drop_column.q: Language not supported
- ql/src/test/results/clientnegative/orc_drop_column.q.out: Language not supported
Comments suppressed due to low confidence (2)
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java:948
- The code throws an exception when more than one column is detected for a drop operation; please verify that supporting only single-column drops is the intended behavior.
if (schemaDifference.getMissingFromFirst().size() > 1) {
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/column/drop/AlterTableDropColumnOperation.java:61
- [nitpick] Consider defining a constant for the SerDe string 'org.apache.hadoop.hive.serde.thrift.columnsetSerDe' to improve maintainability and reduce potential typos.
if ("org.apache.hadoop.hive.serde.thrift.columnsetSerDe".equals(serializationLib)) {
@Override | ||
protected void analyzeCommand(TableName tableName, Map<String, String> partitionSpec, ASTNode command) | ||
throws SemanticException { | ||
boolean ifExists = (command.getFirstChildWithType(HiveParser.TOK_IFEXISTS)!=null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These parantheses are probably unnecessary here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed parenthesis
} | ||
} | ||
if(droppingIndex==-1){ | ||
if(desc.isIfExists())return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using braces for an if
block even when the block consists of a single line of code is a good practice to follow :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added braces
SET hive.exec.schema.evolution=true; | ||
create table src_orc (key int, val string) stored as orc; | ||
insert into src_orc values (1, "Alice"); | ||
alter table src_orc drop column val; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a newline at EOF for this and other .q
files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added in it and others as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have reviewed this only partially. Other than specific comments, I found that the spacing is inconsistent in various places. It is a good practice to not deviate from the coding convention for better readability and maintainability. You might find this helpful: https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-CodingConventions
throw new MetaException("Unsupported operation to use DROP COLUMN for adding a column, changing a " + | ||
"column type, column comment or reordering columns. Only use DROP COLUMN for dropping a column. " + | ||
"For the other operations, consider using the ADD COLUMNS or CHANGE COLUMN commands."); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this Exception is handling too many scenarios. It might be alright but this could be broken into more specific scenarios. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll separate them
POSTHOOK: type: QUERY | ||
POSTHOOK: Input: default@src_orc | ||
#### A masked pattern was here #### | ||
1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: In these tests it might be useful to set hive.cli.print.header=true
to get the column names in the q.out
files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay I'll add that
@@ -45,6 +45,8 @@ public enum HiveOperation { | |||
MSCK("MSCK", HiveParser.TOK_MSCK, null, null), | |||
ALTERTABLE_ADDCOLS("ALTERTABLE_ADDCOLS", HiveParser.TOK_ALTERTABLE_ADDCOLS, new Privilege[]{Privilege.ALTER_METADATA}, | |||
null), | |||
ALTERTABLE_DROPCOL("ALTERTABLE_DROPCOL",HiveParser.TOK_ALTERTABLE_DROPCOL,new Privilege[]{Privilege.ALTER_METADATA}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: space
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding more tests as this is an important new feature:
- Negative: Dropping a column which is not present
- Negative: Dropping a column twice
- Have 2-3 partitions in the partitioned tables
- Test drop column with
show
commands likeshow columns
,show create table
- Test with
describe [formatted]
as they list columns too
Also any other aspect which you can think of :)
tests added |
@@ -362,6 +363,8 @@ public enum ErrorMsg { | |||
DBTXNMGR_REQUIRES_CONCURRENCY(10264, | |||
"To use DbTxnManager you must set hive.support.concurrency=true"), | |||
TXNMGR_NOT_ACID(10265, "This command is not allowed on an ACID table {0}.{1} with a non-ACID transaction manager", true), | |||
COLUMN_NOT_FOUND(10266, "Invalid Column Name"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use INVALID_COLUMN
instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll change that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, used already present INVALID_COLUMN_NAME in place of it
@@ -362,6 +363,8 @@ public enum ErrorMsg { | |||
DBTXNMGR_REQUIRES_CONCURRENCY(10264, | |||
"To use DbTxnManager you must set hive.support.concurrency=true"), | |||
TXNMGR_NOT_ACID(10265, "This command is not allowed on an ACID table {0}.{1} with a non-ACID transaction manager", true), | |||
COLUMN_NOT_FOUND(10266, "Invalid Column Name"), | |||
DROP_COLUMN_UNCASCADED(10267, "Drop column is not supported without cascade for partitioned table. SerDe may be incompatible.", true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see an equivalent for REPLACE_COLUMNS
, maybe stick with CANNOT_DROP_COLUMN
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one is added for more explainatory exception in case of dropping columns without cascade in case of parquet:
if (ParquetHiveSerDe.isParquetTable(table) && AlterTableUtils.isSchemaEvolutionEnabled(table, context.getConf()) &&
!desc.isCascade() && table.isPartitioned()) {
LOG.warn("Cannot drop column from a partitioned parquet table without the CASCADE option");
throw new HiveException(ErrorMsg.DROP_COLUMN_UNCASCADED, desc.getDbTableName());
}
HiveSchemaUtil.SchemaDifference schemaDifference = HiveSchemaUtil.getSchemaDiff(hmsCols, icebergCols, true); | ||
|
||
if (schemaDifference.getMissingFromFirst().size() > 1) { | ||
throw new MetaException("Cannot drop more than one column"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the Parser validate this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah parser will validate this I added it to be on safer side, I'll change that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, removed
throw new MetaException("Unsupported operation to use DROP COLUMN for changing a column comment"); | ||
} | ||
|
||
if (outOfOrder != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is that needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this and other 3 if are handling the below exception which is present in handleReplaceColumns in an elaborative manner:
if (!schemaDifference.getMissingFromSecond().isEmpty() || !schemaDifference.getTypeChanged().isEmpty() ||
!schemaDifference.getCommentChanged().isEmpty() || outOfOrder != null) {
throw new MetaException("Unsupported operation to use REPLACE COLUMNS for adding a column, changing a " +
"column type, column comment or reordering columns. Only use REPLACE COLUMNS for dropping columns. " +
"For the other operations, consider using the ADD COLUMNS or CHANGE COLUMN commands.");
}
@@ -927,6 +931,52 @@ private void handleAddColumns(org.apache.hadoop.hive.metastore.api.Table hmsTabl | |||
updateSchema.commit(); | |||
} | |||
|
|||
private void handleDropColumn(org.apache.hadoop.hive.metastore.api.Table hmsTable) throws MetaException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not simply reuse handleReplaceColumns
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because handleReplaceColumns is also handling addition of columns and changing a column
@init { gParent.pushMsg("drop column statement", state); } | ||
@after { gParent.popMsg(state); } | ||
: KW_DROP KW_COLUMN ifExists? columnName restrictOrCascade? | ||
-> ^(TOK_ALTERTABLE_DROPCOL ifExists? columnName restrictOrCascade?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have tests for restrictOrCascade
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is test for cascade added when using desc formatted table
and desc formatted table partition (something)
yield different results for columns present when a column is dropped without cascade as in case of specific partition it also contains dropped column in case normal hive tables;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is no test for restrict , i added it because it was also present in case of parsing grammer of replace columns although for no use
@@ -30,6 +30,7 @@ | |||
public enum AlterTableType { | |||
// column | |||
ADDCOLS("add columns"), | |||
DROPCOL("drop column"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use full name DROP_COLUMN
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll change that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, changed
protected void analyzeCommand(TableName tableName, Map<String, String> partitionSpec, ASTNode command) | ||
throws SemanticException { | ||
boolean ifExists = command.getFirstChildWithType(HiveParser.TOK_IFEXISTS) != null; | ||
String colName = ifExists ? command.getChild(1).getText().toLowerCase() : command.getChild(0).getText().toLowerCase(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
String colName = command.getChild((ifExists) ? 1 : 0).getText().toLowerCase();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll change that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
boolean ifExists = command.getFirstChildWithType(HiveParser.TOK_IFEXISTS) != null; | ||
String colName = ifExists ? command.getChild(1).getText().toLowerCase() : command.getChild(0).getText().toLowerCase(); | ||
boolean isCascade = false; | ||
if (null != command.getFirstChildWithType(HiveParser.TOK_CASCADE)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need if, just assign the result from bool condition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll change that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
super(context, desc); | ||
} | ||
|
||
private static final Set<String> VALID_SERIALIZATION_LIBS = ImmutableSet.of( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is this used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be removed it came by mistake from AlterTableReplaceColumnsOperation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, removed
/** | ||
* Operation process of dropping column. | ||
*/ | ||
public class AlterTableDropColumnOperation extends AbstractAlterTableOperation<AlterTableDropColumnDesc> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should it be different from AlterTableReplaceColumnsOperation
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AlterTableReplaceColumnsOperation
is replacing the current columnNameTypeList with provided columnNameTypeList in the query whereas AlterTableDropColumnOperation
is dropping that particular column from list of columns whose name is specified in the query.
/** | ||
* DDL task description for ALTER TABLE ... DROP COLUMN ... command. | ||
*/ | ||
@Explain(displayName = "Drop Column", explainLevels = {Level.USER, Level.DEFAULT, Level.EXTENDED}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i can't find any tests that validate 'EXPLAIN'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll modify the tests for explain
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done , added explain statement in tests
|
What changes were proposed in this pull request?
The changes allow dropping column via alter table drop column
Why are the changes needed?
Before, the drop column query was not there as regarding sql coverage.
Does this PR introduce any user-facing change?
Now, user can also drop column via alter table drop column and no need to use replace columns for it
Is the change a dependency upgrade?
No
How was this patch tested?
I tested it on my local machine.