libin's个人网站3.0-HBase PythonAPI

当前位置：首页 >> HBase >> HBase PythonAPI

HBase PythonAPI

2021-12-25 13:25:52 星期六阅读：972

![](/static/images/article_images/1693147445.004766.jpeg)

Python操作HBase，目前比较好的第三方库为：happybase，支持基本的增删改等从操作。

`pip3 install happybase`

#### HBase表数据探查
```python
# -*- coding: utf-8 -*-
import happybase
import time
import threading
import json

HOST = "ip"
PORT = 6004
SIZE = 11
TABLENAME = "namespace:tablename"
class HBaseResearch():
    def print_data_sample(self, tablename, limit):
        with self.pool.connection() as conn:
            table = conn.table(bytes(tablename, encoding="utf8"))
            scanner = table.scan(batch_size=1000, limit=limit)
            for rowkey, values in scanner:
                print(f"RK:
{rowkey}
VALUES:
{values}
")
                print("-" * 80)
```

#### 获取某个表的region信息
```
    # 获取某个表的region信息
    def get_regions_info(self, tablename):
        with self.pool.connection() as conn:
            table = conn.table(bytes(tablename, encoding="utf8"))
            regions = table.regions()
            servers = []
            for region in regions:
                servers.append(region.get("server_name"))
                print(region)
            distinct_servers = set(servers)
            print("-" * 60)
        return f"
Region Number：{len(regions)}
Servers Number：{len(distinct_servers)}
Server Detail：{distinct_servers}"

```

#### 根据RowKey范围获取数据
```
    def download_data_from_key(self, tablename, start_key, end_key, filepath):
        with self.pool.connection() as conn:
            start_time = time.time()
            counter = 0
            table = conn.table(bytes(tablename, encoding="utf8"))
            print(f"线程:{threading.currentThread().getName()}	start_key:{start_key}-{self.my_hex(start_key)}	end_key:{end_key}-{self.my_hex(end_key)}")
            for num in range(start_key, end_key):  # 获取当前prefix
                row_prefix = bytes(self.my_hex(num), encoding="utf8")
                prefix_start_time = time.time()
                scanner = table.scan(batch_size=5000, row_prefix=row_prefix)
                for row_key, values in scanner:
                    lines = {}
                    # 二进制数据的key 和 value转换为str类型
                    for field, value in values.items():
                        lines[str(field, encoding="utf-8")] = str(value, encoding="utf-8")
                    # 字典转化为json并写入本地文件（注意一定要是append格式写入）
                    with open(filepath, "a") as f:
                        f.write(json.dumps(lines, ensure_ascii=False) + "
")
                        counter += 1
                prefix_end_time = time.time()
                print(f"线程{threading.currentThread().getName()}的row_prefix:{row_prefix}已处理完成，用时：{prefix_end_time - prefix_start_time}")
            end_time = time.time()
            print(f"线程{threading.currentThread().getName()}共写入{counter}条数据，用时：{end_time - start_time}")
```

当前位置： 首页 >> HBase >> HBase PythonAPI

HBase PythonAPI

技术交流 问题反馈

当前位置：首页 >> HBase >> HBase PythonAPI

技术交流
问题反馈