PowerQuery 获取 Word 文档文本内容
第一步定义一个自定义函数fx,关于office文档解析还可以参看https://pbihub.cn/blog/640:
let
fx = (path)=>
let
Header = BinaryFormat.Record([
MiscHeader = BinaryFormat.Binary(14),
BinarySize = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32,0),
FileSize = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32,0),
FileNameLen= BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16,0),
ExtrasLen = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16,0)]),
HeaderChoice = BinaryFormat.Choice(
BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32,0),
each if _ <> 67324752 then BinaryFormat.Record([IsValid = false,Filename=null,Content=null])
else BinaryFormat.Choice(BinaryFormat.Binary(26), // Header payload - 14+4+4+2+2
each BinaryFormat.Record([IsValid = true,
Filename = BinaryFormat.Text(Header(_)[FileNameLen]),
Extras = BinaryFormat.Text(Header(_)[ExtrasLen]),
Content = BinaryFormat.Transform(BinaryFormat.Binary(Header(_)[BinarySize]),
(x) => try Binary.Buffer(Binary.Decompress(x, Compression.Deflate)) otherwise null)
]),
type binary)),
ZipFormat = BinaryFormat.List(HeaderChoice, each [IsValid]),
Entries=List.Transform(List.RemoveLastN(ZipFormat(File.Contents(path)),1),(e)=>[FileName=e[Filename],Content=e[Content]])
in
Entries
in
fx
第二步调用函数:
let
源 = Table.FromRecords(fx("C:\Users\Administrator\Desktop\测试.docx")),
Content = 源{[FileName="word/document.xml"]}[Content],
xml = Text.FromBinary(Content),
data = Xml.Tables(xml){0}[Table]{0}[Table][r],
txt = List.Transform(List.RemoveNulls(data),each Text.Combine(if [t]{0} is table then List.Transform([t],(x)=>x[#"Element:Text"]{0}) else [t]))
in
txt
大家可以自行修改提取Word中的表格数据...
来广东玩啊,在一声声靓仔声中迷失自己......
查身份证归属地:
Json.Document(Web.Contents("https://api.guaqb.cn/music/id/card.php?id=430103199003081906"))
道高一尺 魔高一丈
https://pbihub.cn/users/44
自由转载-非商用-非衍生-保持署名(创意共享3.0许可证)