Skip to content

HIVE-15984: Allow ALTER TABLE DROP COLUMN #5715

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ramitg254
Copy link
Contributor

@ramitg254 ramitg254 commented Mar 25, 2025

What changes were proposed in this pull request?

The changes allow dropping column via alter table drop column

Why are the changes needed?

Before, the drop column query was not there as regarding sql coverage.

Does this PR introduce any user-facing change?

Now, user can also drop column via alter table drop column and no need to use replace columns for it

Is the change a dependency upgrade?

No

How was this patch tested?

I tested it on my local machine.

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request adds support for the ALTER TABLE DROP COLUMN command in Hive. Key changes include the introduction of a new descriptor and analyzer for the drop column syntax, updates to the Iceberg meta hook to handle drop column operations, and adjustments in privilege and error messaging to facilitate the new functionality.

Reviewed Changes

Copilot reviewed 14 out of 23 changed files in this pull request and generated no comments.

Show a summary per file
File Description
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/column/drop/AlterTableDropColumnDesc.java New descriptor for DROP COLUMN operations
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java Updated to support the drop column operation and enforce single-column drop semantics
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/column/drop/AlterTableDropColumnAnalyser.java Analyzer implementation for the DROP COLUMN statement
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/column/drop/AlterTableDropColumnOperation.java Implementation of the drop column alteration including SerDe handling and column removal logic
common/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java New error messages for drop column operations
Others Updates in enums, privileges, and test cases to accommodate the new drop column behavior
Files not reviewed (9)
  • iceberg/iceberg-handler/src/test/queries/positive/iceberg_drop_column.q: Language not supported
  • iceberg/iceberg-handler/src/test/results/positive/iceberg_drop_column.q.out: Language not supported
  • parser/src/java/org/apache/hadoop/hive/ql/parse/AlterClauseParser.g: Language not supported
  • parser/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g: Language not supported
  • ql/src/test/queries/clientnegative/orc_drop_column.q: Language not supported
  • ql/src/test/queries/clientnegative/parquet_drop_column.q: Language not supported
  • ql/src/test/queries/clientpositive/orc_drop_column.q: Language not supported
  • ql/src/test/queries/clientpositive/parquet_drop_column.q: Language not supported
  • ql/src/test/results/clientnegative/orc_drop_column.q.out: Language not supported
Comments suppressed due to low confidence (2)

iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java:948

  • The code throws an exception when more than one column is detected for a drop operation; please verify that supporting only single-column drops is the intended behavior.
if (schemaDifference.getMissingFromFirst().size() > 1) {

ql/src/java/org/apache/hadoop/hive/ql/ddl/table/column/drop/AlterTableDropColumnOperation.java:61

  • [nitpick] Consider defining a constant for the SerDe string 'org.apache.hadoop.hive.serde.thrift.columnsetSerDe' to improve maintainability and reduce potential typos.
if ("org.apache.hadoop.hive.serde.thrift.columnsetSerDe".equals(serializationLib)) {

@Override
protected void analyzeCommand(TableName tableName, Map<String, String> partitionSpec, ASTNode command)
throws SemanticException {
boolean ifExists = (command.getFirstChildWithType(HiveParser.TOK_IFEXISTS)!=null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These parantheses are probably unnecessary here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed parenthesis

}
}
if(droppingIndex==-1){
if(desc.isIfExists())return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using braces for an if block even when the block consists of a single line of code is a good practice to follow :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added braces

SET hive.exec.schema.evolution=true;
create table src_orc (key int, val string) stored as orc;
insert into src_orc values (1, "Alice");
alter table src_orc drop column val;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a newline at EOF for this and other .q files

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added in it and others as well

Copy link
Contributor

@soumyakanti3578 soumyakanti3578 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reviewed this only partially. Other than specific comments, I found that the spacing is inconsistent in various places. It is a good practice to not deviate from the coding convention for better readability and maintainability. You might find this helpful: https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-CodingConventions

throw new MetaException("Unsupported operation to use DROP COLUMN for adding a column, changing a " +
"column type, column comment or reordering columns. Only use DROP COLUMN for dropping a column. " +
"For the other operations, consider using the ADD COLUMNS or CHANGE COLUMN commands.");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this Exception is handling too many scenarios. It might be alright but this could be broken into more specific scenarios. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll separate them

POSTHOOK: type: QUERY
POSTHOOK: Input: default@src_orc
#### A masked pattern was here ####
1
Copy link
Contributor

@soumyakanti3578 soumyakanti3578 Mar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: In these tests it might be useful to set hive.cli.print.header=true to get the column names in the q.out files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay I'll add that

@@ -45,6 +45,8 @@ public enum HiveOperation {
MSCK("MSCK", HiveParser.TOK_MSCK, null, null),
ALTERTABLE_ADDCOLS("ALTERTABLE_ADDCOLS", HiveParser.TOK_ALTERTABLE_ADDCOLS, new Privilege[]{Privilege.ALTER_METADATA},
null),
ALTERTABLE_DROPCOL("ALTERTABLE_DROPCOL",HiveParser.TOK_ALTERTABLE_DROPCOL,new Privilege[]{Privilege.ALTER_METADATA},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: space

Copy link
Contributor

@soumyakanti3578 soumyakanti3578 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding more tests as this is an important new feature:

  1. Negative: Dropping a column which is not present
  2. Negative: Dropping a column twice
  3. Have 2-3 partitions in the partitioned tables
  4. Test drop column with show commands like show columns, show create table
  5. Test with describe [formatted] as they list columns too

Also any other aspect which you can think of :)

@ramitg254
Copy link
Contributor Author

Consider adding more tests as this is an important new feature:

  1. Negative: Dropping a column which is not present
  2. Negative: Dropping a column twice
  3. Have 2-3 partitions in the partitioned tables
  4. Test drop column with show commands like show columns, show create table
  5. Test with describe [formatted] as they list columns too

Also any other aspect which you can think of :)

tests added

@@ -362,6 +363,8 @@ public enum ErrorMsg {
DBTXNMGR_REQUIRES_CONCURRENCY(10264,
"To use DbTxnManager you must set hive.support.concurrency=true"),
TXNMGR_NOT_ACID(10265, "This command is not allowed on an ACID table {0}.{1} with a non-ACID transaction manager", true),
COLUMN_NOT_FOUND(10266, "Invalid Column Name"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use INVALID_COLUMN instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, used already present INVALID_COLUMN_NAME in place of it

@@ -362,6 +363,8 @@ public enum ErrorMsg {
DBTXNMGR_REQUIRES_CONCURRENCY(10264,
"To use DbTxnManager you must set hive.support.concurrency=true"),
TXNMGR_NOT_ACID(10265, "This command is not allowed on an ACID table {0}.{1} with a non-ACID transaction manager", true),
COLUMN_NOT_FOUND(10266, "Invalid Column Name"),
DROP_COLUMN_UNCASCADED(10267, "Drop column is not supported without cascade for partitioned table. SerDe may be incompatible.", true),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see an equivalent for REPLACE_COLUMNS, maybe stick with CANNOT_DROP_COLUMN?

Copy link
Contributor Author

@ramitg254 ramitg254 Apr 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one is added for more explainatory exception in case of dropping columns without cascade in case of parquet:

if (ParquetHiveSerDe.isParquetTable(table) && AlterTableUtils.isSchemaEvolutionEnabled(table, context.getConf()) &&
                 !desc.isCascade() && table.isPartitioned()) {
             LOG.warn("Cannot drop column from a partitioned parquet table without the CASCADE option");
             throw new HiveException(ErrorMsg.DROP_COLUMN_UNCASCADED, desc.getDbTableName());
         }

HiveSchemaUtil.SchemaDifference schemaDifference = HiveSchemaUtil.getSchemaDiff(hmsCols, icebergCols, true);

if (schemaDifference.getMissingFromFirst().size() > 1) {
throw new MetaException("Cannot drop more than one column");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the Parser validate this?

Copy link
Contributor Author

@ramitg254 ramitg254 Apr 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah parser will validate this I added it to be on safer side, I'll change that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, removed

throw new MetaException("Unsupported operation to use DROP COLUMN for changing a column comment");
}

if (outOfOrder != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is that needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this and other 3 if are handling the below exception which is present in handleReplaceColumns in an elaborative manner:

if (!schemaDifference.getMissingFromSecond().isEmpty() || !schemaDifference.getTypeChanged().isEmpty() ||
        !schemaDifference.getCommentChanged().isEmpty() || outOfOrder != null) {
      throw new MetaException("Unsupported operation to use REPLACE COLUMNS for adding a column, changing a " +
          "column type, column comment or reordering columns. Only use REPLACE COLUMNS for dropping columns. " +
          "For the other operations, consider using the ADD COLUMNS or CHANGE COLUMN commands.");
    }

@@ -927,6 +931,52 @@ private void handleAddColumns(org.apache.hadoop.hive.metastore.api.Table hmsTabl
updateSchema.commit();
}

private void handleDropColumn(org.apache.hadoop.hive.metastore.api.Table hmsTable) throws MetaException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not simply reuse handleReplaceColumns?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because handleReplaceColumns is also handling addition of columns and changing a column

@init { gParent.pushMsg("drop column statement", state); }
@after { gParent.popMsg(state); }
: KW_DROP KW_COLUMN ifExists? columnName restrictOrCascade?
-> ^(TOK_ALTERTABLE_DROPCOL ifExists? columnName restrictOrCascade?)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have tests for restrictOrCascade?

Copy link
Contributor Author

@ramitg254 ramitg254 Apr 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is test for cascade added when using desc formatted table and desc formatted table partition (something) yield different results for columns present when a column is dropped without cascade as in case of specific partition it also contains dropped column in case normal hive tables;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no test for restrict , i added it because it was also present in case of parsing grammer of replace columns although for no use

@@ -30,6 +30,7 @@
public enum AlterTableType {
// column
ADDCOLS("add columns"),
DROPCOL("drop column"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use full name DROP_COLUMN

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, changed

protected void analyzeCommand(TableName tableName, Map<String, String> partitionSpec, ASTNode command)
throws SemanticException {
boolean ifExists = command.getFirstChildWithType(HiveParser.TOK_IFEXISTS) != null;
String colName = ifExists ? command.getChild(1).getText().toLowerCase() : command.getChild(0).getText().toLowerCase();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

String colName = command.getChild((ifExists) ? 1 : 0).getText().toLowerCase();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

boolean ifExists = command.getFirstChildWithType(HiveParser.TOK_IFEXISTS) != null;
String colName = ifExists ? command.getChild(1).getText().toLowerCase() : command.getChild(0).getText().toLowerCase();
boolean isCascade = false;
if (null != command.getFirstChildWithType(HiveParser.TOK_CASCADE)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need if, just assign the result from bool condition

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

super(context, desc);
}

private static final Set<String> VALID_SERIALIZATION_LIBS = ImmutableSet.of(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be removed it came by mistake from AlterTableReplaceColumnsOperation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, removed

/**
* Operation process of dropping column.
*/
public class AlterTableDropColumnOperation extends AbstractAlterTableOperation<AlterTableDropColumnDesc> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be different from AlterTableReplaceColumnsOperation?

Copy link
Contributor Author

@ramitg254 ramitg254 Apr 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AlterTableReplaceColumnsOperation is replacing the current columnNameTypeList with provided columnNameTypeList in the query whereas AlterTableDropColumnOperation is dropping that particular column from list of columns whose name is specified in the query.

/**
* DDL task description for ALTER TABLE ... DROP COLUMN ... command.
*/
@Explain(displayName = "Drop Column", explainLevels = {Level.USER, Level.DEFAULT, Level.EXTENDED})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can't find any tests that validate 'EXPLAIN'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll modify the tests for explain

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done , added explain statement in tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants