python - Methods to reduce a Tensor embedding to x,y,z coordinates - Stack Overflow

IT技术

更新时间：2025-01-0823

admin管理员组
文章数量:1328836

I have a model from hugging face and would like to use it for performing word comparisons. At first I thought of performing a series of similarity calculations across words of interest but quickly I found that this problem would exponentially grow as the number of words expanded as well.

A solution I thought about is plotting a skip gram where all words result on a 2 dimensional plane and then can simply perform clustering on the coordinates to find similar words. The problem here is that this requires a bert model and a low embedding layer that can be mapped.

As I have a pretrained model, I don't know if I can create a skip gram with from it. I was hoping to calculate the embedding and through the use of a transformation, convert the embedding into coordinates that I can plot myself. I though do not know if this is possible or reasonable

I tried to do it though with the code below

from sklearn.manifold import TSNE
from transformers import AutoModel, AutoTokenizer

# target word
word = ["Slartibartfast"]

# model setup
model = 'Alibaba-NLP/gte-multilingual-base'
tokenizer = AutoTokenizer.from_pretrained(model)
auto_model = AutoModel.from_pretrained(model, trust_remote_code=True)

# embbed and calculate
batch_dict = self.tokenizer(text_list, max_length=8192, padding=True, truncation=True, return_tensors='pt')
result = auto_model(**batch_dict)
embeddings = outputs.last_hidden_state[:, 0][:768]

# transform to coordinates
clayer = TSNE(n_components=3, learning_rate='auto', init='random', perplexity=50)
embedding_numpy = embeddings.detach().numpy()
clayer.fit_transform(embedding_numpy)  # crashes here saying perplexity must be less than n_samples

I have a model from hugging face and would like to use it for performing word comparisons. At first I thought of performing a series of similarity calculations across words of interest but quickly I found that this problem would exponentially grow as the number of words expanded as well.

A solution I thought about is plotting a skip gram where all words result on a 2 dimensional plane and then can simply perform clustering on the coordinates to find similar words. The problem here is that this requires a bert model and a low embedding layer that can be mapped.

As I have a pretrained model, I don't know if I can create a skip gram with from it. I was hoping to calculate the embedding and through the use of a transformation, convert the embedding into coordinates that I can plot myself. I though do not know if this is possible or reasonable

I tried to do it though with the code below

from sklearn.manifold import TSNE
from transformers import AutoModel, AutoTokenizer

# target word
word = ["Slartibartfast"]

# model setup
model = 'Alibaba-NLP/gte-multilingual-base'
tokenizer = AutoTokenizer.from_pretrained(model)
auto_model = AutoModel.from_pretrained(model, trust_remote_code=True)

# embbed and calculate
batch_dict = self.tokenizer(text_list, max_length=8192, padding=True, truncation=True, return_tensors='pt')
result = auto_model(**batch_dict)
embeddings = outputs.last_hidden_state[:, 0][:768]

# transform to coordinates
clayer = TSNE(n_components=3, learning_rate='auto', init='random', perplexity=50)
embedding_numpy = embeddings.detach().numpy()
clayer.fit_transform(embedding_numpy)  # crashes here saying perplexity must be less than n_samples

Share Improve this question asked Nov 21, 2024 at 11:19 linkey apiacess 1271 silver badge8 bronze badges

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

After more through reading, it was brough to my attention that it would be impossible to use TSNE in the manner which I was hoping as the dimensions generated by TSNE is only representative of the training data. Further fitting with new data or transformation of data not within the training set would result in outputs that are not on a similar range and thus noncomparable.

I found a replacement to TSNE which is called umap. umap is also for dimension reduction but it can be fitted multiple times and data can be transformed along the same range.

I will explore umap and see if it will work for what I need.

本文标签： pythonMethods to reduce a Tensor embedding to x Y z coordinatesStack Overflow

版权声明：本文标题：python - Methods to reduce a Tensor embedding to x,y,z coordinates - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1736311465a1934746.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

python - Methods to reduce a Tensor embedding to x,y,z coordinates - Stack Overflow

1 Answer 1

更多相关文章

【解决报错原因分析】画图plt.contourf(X,Y,Z)报错TypeError: unhashable type: ‘numpy.ndarray‘（含详细示例讲解）

计算机组装配置（CPU 后面的字母 有U,H,Y,HQ,M 怎么区别？）

Tensorflow 2.*网络训练(二) fit(x, y, batch

matlab中的spline,在MATLAB中与spline(x,y,xi)插值效果相同的命令是( ）

y

【题目】两个乒乓球队进行比赛，各出三人。甲队为a,b,c三人，乙队为x,y,z三人。已抽签决定比赛名单。有人向队员打听比赛的名单。a说他不和x比，c说他不和x,z比，请编程序找出三队赛手的名单。

python - &#39;Arrow3D&#39; object has no attribute &#39;do_3d_projection&#39;; set up X, Y, Z axis on 3D plane -

python - Methods to reduce a Tensor embedding to x,y,z coordinates - Stack Overflow

css - Calculate the bounding box&#39;s X, Y, Height and Width of a rotated element via JavaScript - Stack Overflow

javascript - OpenCV Rect conventions – What is x, y, width, height? - Stack Overflow

javascript - How can I calculate the heading (NWSE) given (x,y,z) magnetometer and accelerometer data? - Stack Overflow

javascript - How to convert x, y, z to longitude, latitude, altitude in Cesium? - Stack Overflow

python - How to fix alignment of projection from (x,y,z) coordinates onto xy-plane in matplotlib 3d plot? - Stack Overflow

发表评论

推荐文章

javascript - How to find out which ecosystem.json file is used to start a process in pm2 - Stack Overflow

javascript - Bootstrap Selectpicker not working on a global modal - Stack Overflow

javascript - Access variables outside the jQuery .each() loop - Stack Overflow

Conditional redirect to several pages

javascript - How to return json dictionary in django ajax update - Stack Overflow

热门文章

javascript - Ember.js -- how to console log a model - Stack Overflow

customization - Display User ID instead of Name or Username

asynchronous - how to make Javascript setTimeout returns value in a function - Stack Overflow

javascript - Vue children Router: url changed but display parent component - Stack Overflow

javascript - Vitest with element plus unplugin unknown extension for scss - Stack Overflow

javascript - Slick slider: adding space in between images shown - Stack Overflow

javascript - how to get a URL parameter and set the value of a select option on page load? - Stack Overflow

javascript - jquery selector to get all anchor elements within a given div, no matter how deeply they are nested? - Stack Overfl

javascript - CORS error when using shopify admin api in front end - Stack Overflow

Google Maps Javascript API - bounds.contains method - Stack Overflow

最新文章

python可以在win7上运行吗,python哪个版本支持win7

Win10系统电脑版Centos7安装宝塔搭建青龙面板

win7系统VS2017编译并配置CC++-libcurl(7.59.0)开发环境

【系统安装】双系统——Win7下安装linux系统详细步骤

windows 7系统安装python3

javascript - how to get sum of odd, even numbers using Array.reduce method? - Stack Overflow

javascript - How do I read an uploaded file (text.csv) using nestjs and Multer - Stack Overflow

php - Shortcode to embed Edit Account form not working

jquery - Javascript Grid with multiple groupings - Stack Overflow

javascript - Can&#39;t minify with Combres and yui - Stack Overflow

惠普OMEN 15-CE001TX 2EF91PA参数报价

苹果新款MacBook Pro 15英寸 i732GB1TBVega Pro 20参数报价

联想Y330A-PSE L参数报价

神舟战神Z7 D6 i7-12650H16GB512GBRTX4050旗舰版参数报价

神舟战神Z7 D6 i7-12650H16GB1TBRTX4050参数报价

计算机组装配置（CPU 后面的字母有U,H,Y,HQ,M 怎么区别？）

python - 'Arrow3D' object has no attribute 'do_3d_projection'; set up X, Y, Z axis on 3D plane -

css - Calculate the bounding box's X, Y, Height and Width of a rotated element via JavaScript - Stack Overflow

javascript - Can't minify with Combres and yui - Stack Overflow