PowerQuery 获取 Word 文档文本内容

第一步定义一个自定义函数fx,关于office文档解析还可以参看https://pbihub.cn/blog/640:

let                                         
    fx = (path)=>                                           
let                                         
    Header = BinaryFormat.Record([                                          
        MiscHeader = BinaryFormat.Binary(14),                                           
        BinarySize = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32,0),                                          
        FileSize   = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32,0),                                          
        FileNameLen= BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16,0),                                          
        ExtrasLen  = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16,0)]),                                            

    HeaderChoice = BinaryFormat.Choice(                                         
        BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32,0),                                           
        each if _ <> 67324752 then BinaryFormat.Record([IsValid = false,Filename=null,Content=null])                                            
                              else BinaryFormat.Choice(BinaryFormat.Binary(26),      // Header payload - 14+4+4+2+2                                         
                                       each BinaryFormat.Record([IsValid  = true,                                           
                                                                 Filename = BinaryFormat.Text(Header(_)[FileNameLen]),                                          
                                                                 Extras   = BinaryFormat.Text(Header(_)[ExtrasLen]),                                            
                                                                 Content  = BinaryFormat.Transform(BinaryFormat.Binary(Header(_)[BinarySize]),                                          
                                                                                (x) => try Binary.Buffer(Binary.Decompress(x, Compression.Deflate)) otherwise null)                                         
                                                               ]),                                          
                        type binary)),                                          

    ZipFormat = BinaryFormat.List(HeaderChoice, each [IsValid]),                                            
    Entries=List.Transform(List.RemoveLastN(ZipFormat(File.Contents(path)),1),(e)=>[FileName=e[Filename],Content=e[Content]])                                           
in                                          
    Entries                                         
in                                          
fx

第二步调用函数:

let                                                 
    源 = Table.FromRecords(fx("C:\Users\Administrator\Desktop\测试.docx")),                                                    
    Content = 源{[FileName="word/document.xml"]}[Content],                                                   
    xml = Text.FromBinary(Content),                                                 
    data = Xml.Tables(xml){0}[Table]{0}[Table][r],                                                  
    txt = List.Transform(List.RemoveNulls(data),each Text.Combine(if [t]{0} is table then List.Transform([t],(x)=>x[#"Element:Text"]{0}) else [t]))                                                 
in                                                  
    txt 

大家可以自行修改提取Word中的表格数据...

来广东玩啊,在一声声靓仔声中迷失自己......

查身份证归属地:
Json.Document(Web.Contents("https://api.guaqb.cn/music/id/card.php?id=430103199003081906"))

file

道高一尺 魔高一丈
https://pbihub.cn/users/44
M与DAX的恩怨纠葛