Java中文档的转换问题


Java中文档的转换问题

我的经验来看,转换为word是不会出现乱码问题的

1. 把文件变成响应流

//把zip文件流变成响应流
OutputStream os = null;
try {
    response.setHeader("Content-disposition", "attachment;filename=" + zipName);
    os = response.getOutputStream();
    FileUtils.copyFile(new File(zipName), os);
} catch (IOException e) {
    e.getStackTrace();
} finally {
    if (os != null) {
        try {
            os.flush();
            os.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
    try {
        FileSystemUtils.deleteRecursively(Paths.get(zipName));
    } catch (IOException e) {
        e.printStackTrace();
    }
}

2. ZIP压缩流


        //打包所有的doc
        Calendar calendar = Calendar.getInstance();
        int year = calendar.get(Calendar.YEAR);
        int month = calendar.get(Calendar.MONTH)+1;
        int day = calendar.get(Calendar.DATE);
        int hour = calendar.get(Calendar.HOUR);
        int minute = calendar.get(Calendar.MINUTE);
        String zipName = year + "_" + month + "_" + day + " " + hour + "_" + minute + ".zip";
        File file = new File(uuid);

        File zipFile = new File(zipName);
        File listFiles[] = file.listFiles();
        InputStream input = null;
        ZipOutputStream zipOut = null;
        try {
            zipOut = new ZipOutputStream(new FileOutputStream(zipFile));
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        }
        try {
            int temp = 0;
            for (File listFile : listFiles) {
                try {
                    input = new FileInputStream(listFile);
                    zipOut.putNextEntry(new ZipEntry(file.getName() + File.separator + listFile.getName()));
                    while ((temp = input.read()) != -1) {
                        zipOut.write(temp);
                    }
                } catch (IOException e) {
                    e.printStackTrace();
                } finally {
                    if (input != null) {
                        try {
                            input.close();
                        } catch (IOException e) {
                            e.printStackTrace();
                        }
                    }
                }
            }
        } finally {
            if (zipOut != null) {
                try {
                    zipOut.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }

文档转换

使用https://www.e-iceblue.cn/tutorials.html

这个文档转换工具,有免费的,,转换成word不成问题

生成word

import com.spire.doc.*;
import com.spire.doc.documents.HorizontalAlignment;
import com.spire.doc.documents.Paragraph;
import com.spire.doc.documents.ParagraphStyle;

import java.awt.*;

public class CreateWordDocument {
    public static void main(String[] args){
        //创建Word文档
        Document document = new Document();

        //添加一个section
        Section section = document.addSection();

        //添加三个段落至section
        Paragraph para1 = section.addParagraph();
        para1.appendText("滕王阁序");

        Paragraph para2 = section.addParagraph();
        para2.appendText("豫章故郡,洪都新府。星分翼轸,地接衡庐。襟三江而带五湖,控蛮荆而引瓯越。"+
                        "物华天宝,龙光射牛斗之墟;人杰地灵,徐孺下陈蕃之榻。雄州雾列,俊采星驰。台隍枕夷夏之交,宾主尽东南之美。"+
                        "都督阎公之雅望,棨戟遥临;宇文新州之懿范,襜帷暂驻。十旬休假,胜友如云;千里逢迎,高朋满座。"+
                        "腾蛟起凤,孟学士之词宗;紫电青霜,王将军之武库。家君作宰,路出名区;童子何知,躬逢胜饯。");

        Paragraph para3 = section.addParagraph();
        para3.appendText("时维九月,序属三秋。潦水尽而寒潭清,烟光凝而暮山紫。俨骖騑于上路,访风景于崇阿;临帝子之长洲,得天人之旧馆。"+
                        "层峦耸翠,上出重霄;飞阁流丹,下临无地。鹤汀凫渚,穷岛屿之萦回;桂殿兰宫,即冈峦之体势。");

        //将第一段作为标题,设置标题格式
        ParagraphStyle style1 = new ParagraphStyle(document);
        style1.setName("titleStyle");
        style1.getCharacterFormat().setBold(true);
        style1.getCharacterFormat().setTextColor(Color.BLUE);
        style1.getCharacterFormat().setFontName("宋体");
        style1.getCharacterFormat().setFontSize(12f);
        document.getStyles().add(style1);
        para1.applyStyle("titleStyle");

        //设置其余两个段落的格式
        ParagraphStyle style2 = new ParagraphStyle(document);
        style2.setName("paraStyle");
        style2.getCharacterFormat().setFontName("宋体");
        style2.getCharacterFormat().setFontSize(11f);
        document.getStyles().add(style2);
        para2.applyStyle("paraStyle");
        para3.applyStyle("paraStyle");

        //设置第一个段落的对齐方式
        para1.getFormat().setHorizontalAlignment(HorizontalAlignment.Center);

        //设置第二段和第三段的段首缩进
        para2.getFormat().setFirstLineIndent(25f);
        para3.getFormat().setFirstLineIndent(25f);

        //设置第一段和第二段的段后间距
        para1.getFormat().setAfterSpacing(15f);
        para2.getFormat().setAfterSpacing(10f);

        //保存文档
        document.saveToFile("Output.docx", FileFormat.Docx);
    }
}

Word转PDF

import com.spire.doc.*;

public class WordtoPDF {
    public static void main(String[] args) {

        //加载word示例文档
        Document document = new Document();
        document.loadFromFile("Sample.docx");


        //保存结果文件
        document.saveToFile("out/toPDF.pdf", FileFormat.PDF);

    }
}

但我可以告诉你,这个东西转换出来的东西会有乱码,中文不会,小语种会乱码..

有什么解决办法呢?

我找了三天终于找到了,

使用openoffice组件就行 ,跨平台性,是最好的

https://openoffice.apache.org/downloads.html

转换教程

启动OpenOffice的服务

进openoffice安装目录,通过cmd启动一个soffice服务,启动的命令是soffice -headless -accept="socket,host=127.0.0.1,port=8100;urp;"

soffice -headless -accept="socket,host=127.0.0.1,port=8100;urp;"

切记 运行这一句,而且是在openoffice的安装目录下(windows)

public class PDFDemo {
    public static boolean officeToPDF(String sourceFile, String destFile) {
        try {
            File inputFile = new File(sourceFile);
            if (!inputFile.exists()) {
                // 找不到源文件, 则返回false
                return false;
            }
            // 如果目标路径不存在, 则新建该路径
            File outputFile = new File(destFile);
            if (!outputFile.getParentFile().exists()) {
                outputFile.getParentFile().mkdirs();
            }
            //如果目标文件存在,则删除
            if (outputFile.exists()) {
                outputFile.delete();
            }
            DateFormat df = new SimpleDateFormat("yyyy-MM-dd HH:mm");
            OpenOfficeConnection connection = new SocketOpenOfficeConnection("127.0.0.1", 8100);
            connection.connect();
            //用于测试openOffice连接时间
            System.out.println("连接时间:" + df.format(new Date()));
            DocumentConverter converter = new StreamOpenOfficeDocumentConverter(
                    connection);
            converter.convert(inputFile, outputFile);
            //测试word转PDF的转换时间
            System.out.println("转换时间:" + df.format(new Date()));
            connection.disconnect();
            return true;
        } catch (ConnectException e) {
            e.printStackTrace();
            System.err.println("openOffice连接失败!请检查IP,端口");
        } catch (Exception e) {
            e.printStackTrace();
        }
        return false;
}

public static void main(String[] args) {
    officeToPDF("E:\\test.docx", "E:\\test.pdf");
}
<dependency>
    <groupId>com.artofsolving</groupId>
    <artifactId>jodconverter</artifactId>
    <version>2.2.1</version>
</dependency>
<dependency>
    <groupId>org.openoffice</groupId>
    <artifactId>jurt</artifactId>
    <version>3.0.1</version>
</dependency>
<dependency>
    <groupId>org.openoffice</groupId>
    <artifactId>ridl</artifactId>
    <version>3.0.1</version>
</dependency>
<dependency>
    <groupId>org.openoffice</groupId>
    <artifactId>juh</artifactId>
    <version>3.0.1</version>
</dependency>
<dependency>
    <groupId>org.openoffice</groupId>
    <artifactId>unoil</artifactId>
    <version>3.0.1</version>
</dependency>
<dependency>
    <groupId>org.slf4j</groupId>
    <artifactId>slf4j-jdk14</artifactId>
    <version>1.4.3</version>
</dependency>

生成excel

/**
 * 导出为excel
 *
 * @param response
 * @param searchResponse
 */
private void exportForExcel(HttpServletResponse response, SearchResponse searchResponse) {
    SearchHit[] hits = searchResponse.getHits().getHits();
    String[] title = new String[]{"序号", "标题原文", "标题译文", "分类", "作者", "来源",
            "发布时间", "内容原文", "内容译文"};
    //excel标题
    //sheet名
    String sheetName = "Sheet1";
    String[][] content = new String[hits.length][title.length];
    for (int i = 0; i < hits.length; i++) {
        Map<String, Object> sourceAsMap = hits[i].getSourceAsMap();
        FullTextSearchVo vo = KdUtil.map2Object(sourceAsMap, FullTextSearchVo.class);
        TransField titles = vo.getTitle();
        String originalTitle = titles.getOriginal();
        String transTitle = titles.getTrans();
        List<String> authors = vo.getAuthors();
        String author = "";
        if (authors.size() != 0) {
            author = authors.get(0);
        }
        String website = vo.getWebsite();

        String pubTime = (String) sourceAsMap.get(Constant.PUBTIME);

        TransField contentText = vo.getContent();
        String original = contentText.getOriginal();
        String trans = contentText.getTrans();
        List<CategoryAttributes> categoryAttributes = vo.getCategoryAttributes();
        StringBuilder category = new StringBuilder();
        if (KdUtil.isNotEmpty(categoryAttributes)) {
            for (CategoryAttributes categoryAttribute : categoryAttributes) {
                category.append(categoryAttribute.getCategory()).append(" ");
            }
        }

        content[i][0] = String.valueOf(i + 1);
        content[i][1] = originalTitle;
        content[i][2] = transTitle;
        content[i][3] = category.toString();
        content[i][4] = author;
        content[i][5] = website;
        content[i][6] = pubTime;
        content[i][7] = original;
        content[i][8] = trans;
    }

    String fileName = getTimeByCalendar() + ".xls";
    OutputStream os = null;
    try {
        HSSFWorkbook wb = gethssfworkbook(sheetName, title, content, null);
        fileName = new String(fileName.getBytes(), "ISO8859-1");
        response.setContentType("application/octet-stream;charset=ISO8859-1");
        response.setHeader("Content-Disposition", "attachment;filename=" + fileName);
        response.addHeader("Pargam", "no-cache");
        response.addHeader("Cache-Control", "no-cache");
        os = response.getOutputStream();
        wb.write(os);

    } catch (IOException e) {
        e.printStackTrace();
        log.error("exportForExcel({}) error({})", searchResponse, e.getStackTrace());
    } finally {
        if (os != null) {
            try {
                os.flush();
                os.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
}

private String getTimeByCalendar() {
    String now = TimeUtil.now(TimeUtil.YYYY_MM_DD_HH_MM_SS);
    return now.replace(" ", "_").replace(":", "_");
}

/**
 * 返回一个excel类型
 *
 * @param sheetName
 * @param title
 * @param values
 * @param wb
 * @return
 */
public HSSFWorkbook gethssfworkbook(String sheetName, String[] title, String[][] values, HSSFWorkbook wb) {
    // 第一步,创建一个HSSFWorkbook,对应一个Excel文件
    if (wb == null) {
        wb = new HSSFWorkbook();
    }
    // 第二步,在workbook中添加一个sheet,对应Excel文件中的sheet
    HSSFSheet sheet = wb.createSheet(sheetName);
    // 第三步,在sheet中添加表头第0行,注意老版本poi对Excel的行数列数有限制
    HSSFRow row = sheet.createRow(0);
    // 第四步,创建单元格,并设置值表头 设置表头居中
    //声明列对象
    HSSFCell cell = null;
    //创建标题
    for (int i = 0; i < title.length; i++) {
        cell = row.createCell(i);
        cell.setCellValue(title[i]);
    }
    //创建内容
    for (int i = 0; i < values.length; i++) {
        row = sheet.createRow(i + 1);
        for (int j = 0; j < values[i].length; j++) {
            //将内容按顺序赋给对应的列对象
            row.createCell(j).setCellValue(values[i][j]);
        }
    }
    return wb;
}

文章作者: anlen123
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 anlen123 !
 上一篇
aop aop
AOP 基本使用教程什么是AOP?AOP简介 AOP为Aspect Oriented Programming的缩写,意为:面向切面编程,通过预编译方式和运行期间动态代理实现程序功能的统一维护的一种技术。 为什么使用AOP编程范式? 分离功能
2021-03-26 anlen123
下一篇 
Jenkins使用教程 Jenkins使用教程
Jenkins使用教程1.安装jenkins1.使用docker安装jenkins使用docker 安装 不多叙述了 docker run -u root -d -p 9999:8080 -p 50000:50000 -v /root/j
2021-03-14 anlen123
  目录